0% found this document useful (0 votes)
19 views15 pages

Path Planning For Unmanned Surface Vehicle Based On Improved Q-Learning Algorithm

This paper presents an improved Q-Learning algorithm, named neural network smoothing and fast Q-Learning (NSFQ), for efficient path planning and obstacle avoidance in unmanned surface vehicles (USVs). The NSFQ algorithm enhances convergence speed by integrating a radial basis function neural network, optimizing the action space and reward function, and utilizing a third-order Bezier curve for path smoothing. Simulation results demonstrate that the NSFQ algorithm outperforms traditional algorithms like A* and RRT in various evaluation metrics such as path length and smoothness.

Uploaded by

michaelkao2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

Path Planning For Unmanned Surface Vehicle Based On Improved Q-Learning Algorithm

This paper presents an improved Q-Learning algorithm, named neural network smoothing and fast Q-Learning (NSFQ), for efficient path planning and obstacle avoidance in unmanned surface vehicles (USVs). The NSFQ algorithm enhances convergence speed by integrating a radial basis function neural network, optimizing the action space and reward function, and utilizing a third-order Bezier curve for path smoothing. Simulation results demonstrate that the NSFQ algorithm outperforms traditional algorithms like A* and RRT in various evaluation metrics such as path length and smoothness.

Uploaded by

michaelkao2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Ocean Engineering 292 (2024) 116510

Contents lists available at ScienceDirect

Ocean Engineering
journal homepage: www.elsevier.com/locate/oceaneng

Research paper

Path planning for unmanned surface vehicle based on improved


Q-Learning algorithm
Yuanhui Wang a, b, Changzhou Lu a, Peng Wu a, b, *, Xiaoyue Zhang a, b
a
College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street 145, Harbin, 150001, China
b
Sanya Nanhai Innovation and Development Base of Harbin Engineering University, China

A R T I C L E I N F O A B S T R A C T

Handling Editor: Prof. A.I. Incecik Path planning is a key factor for the unmanned surface vehicle (USV) to achieve efficient navigation. In this
paper, to solve the global path planning and obstacle avoidance problems for the USV, an improved Q-Learning
Keywords: algorithm called neural network smoothing and fast Q-Learning (NSFQ) is proposed. Three main improvement
Q-learning parts are composed of the proposed algorithm. Firstly, the radial basis function (RBF) neural network is com­
Unmanned surface vehicle
bined with the Q-Learning algorithm to approximate the action value function Q, which improves the conver­
Path planning
gence speed of the Q-Learning algorithm. Secondly, to ensure that the planned path conforms to the maneuvering
Reinforcement learning
RBF neural network characteristics of the USV, the heading angle, motion characteristics, ship length, and safety of the USV are taken
into account by the proposed algorithm. Based on these factors, the action space and reward function are
optimized, the state space is reconstructed, and the safety threshold is proposed. Finally, a third-order Bezier
curve is used to smooth the initial path, so that the USV can maintain its heading stability during navigation.
Based on simulation results, the proposed NSFQ algorithm outperforms the A* and RRT algorithms in terms of
evaluation indicators such as heading angle, angular velocity, path length, sailing time, and path smoothness.

1. Introduction from the initial position to the target position in a marine environment
with obstacles based on certain constraints and target requirements
With the continuous advancement of artificial intelligence technol­ (Mac et al., 2016).
ogy, the research and application of unmanned driving fields, such as Currently, the A* algorithm, the rapidly exploring random tree
unmanned vehicles, unmanned aerial vehicles (UAVs), and unmanned (RRT) algorithm, the artificial potential field (APF), the particle swarm
surface vehicles (USVs), have undergone tremendous development optimization (PSO) algorithm, the ant colony algorithm (ACA), the ge­
(Wang et al., 2018). Among them, the research on unmanned vehicles is netic algorithm (GA), and other algorithms (Ozturk et al., 2022; Yao
mature, while the research on USVs is currently relatively limited due to et al., 2023) have been widely used in USV path planning. The A* al­
the complexity and uncertainty of the marine environment. USVs are gorithm (Singh et al., 2018) is based on the Dijkstra algorithm (Wang
widely used in marine resource exploration (Jin et al., 2018b), marine et al., 2019a) with the addition of heuristic functions (Hart et al., 1968).
rescue (Li et al., 2023), marine environmental monitoring (Fu and Khan, Although the algorithm improves the computational efficiency, the
2020), collaborative combat (Owen et al., 2021), and other fields planned path has problems with more turns, low path smoothness, and
(Cheng et al., 2021; Jin et al., 2018a). In practical applications, one of small safety distance. The RRT algorithm (Zhao et al., 2022) is a path
the key factors determining whether the USV can achieve effective planning algorithm based on random sampling that has strong search
navigation depends on path planning, which to some extent indicates capability and fast search speed. However, because the sampling points
the level of intelligence of the USV (Sun et al., 2020). are randomly generated, the generated path is not smooth enough and
has too many unnecessary turns, and the planned path length is too long.
The APF algorithm (Sang et al., 2021) is a dynamic path planning al­
1.1. Related work gorithm with good real-time performance and strong adaptability to
dynamic environments. It is often used in combination with other
The path planning of the USV refers to planning an obstacle-free path

* Corresponding author. College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street 145, Harbin, 150001, China.
E-mail address: [email protected] (P. Wu).

https://doi.org/10.1016/j.oceaneng.2023.116510
Received 2 September 2023; Received in revised form 18 November 2023; Accepted 1 December 2023
Available online 21 December 2023
0029-8018/© 2023 Elsevier Ltd. All rights reserved.
Y. Wang et al. Ocean Engineering 292 (2024) 116510

algorithms. However, the algorithm has the disadvantage of being prone global path planning and obstacle avoidance problems for USV with the
to falling into local optimal solutions. In addition, more intelligent following main contributions.
optimization algorithms have been proposed and applied to path plan­
ning, including representative swarm intelligence algorithms such as 1) To improve the convergence speed of the Q-Learning algorithm, the
ACA (Ntakolia and Lyridis, 2022), GA (Xin et al., 2019), and PSO, etc. radial basis function (RBF) neural network is combined with the Q-
These swarm intelligence algorithms have advantages in solving com­ Learning algorithm to approximate the action value function Q.
plex problems, but they also have long computation time, slow Moreover, the heading angle and turning performance of the USV are
convergence speed, and are easy to fall into local minimal values. taken into account by the proposed algorithm, the action space and
All of the above algorithms have their own advantages and disad­ reward functions are optimized, the state space is reconstructed.
vantages, and they all need to assume complete environmental infor­ 2) In response to the obstacle avoidance problem of the USV, a safety
mation. However, there is rarely prior knowledge of marine threshold is proposed to ensure the safety of the USV, which is twice
environments (Wang et al., 2018). Reinforcement learning algorithms the length of the USV. In addition, a third-order Bezier curve is used
do not require any human knowledge or default rules (Lan et al., 2022). to smooth the initial path so that the USV can maintain its heading
The Q-Learning algorithm is one of the classic reinforcement learning stability during navigation.
algorithms and has strong robustness and adaptability to uncertain en­ 3) By comparing with other algorithms (Wang et al., 2019c; Zhao et al.,
vironments (Watkins and Dayan, 1992). The classical Q-Learning (CQL) 2022), this paper demonstrates that the NSFQ algorithm outperforms
algorithm has shortcomings, such as long learning time, low exploration other algorithms in terms of evaluation indicators such as path
efficiency, and slow convergence speed (Duan and Chen, 2019; Xu and length, sailing time, heading angle, angular velocity, and path
Yuan, 2019). Many literatures have made improvements to the classical smoothness.
Q-Learning reinforcement learning algorithm (Chen et al., 2019) to
address its limitations, which is the part of this paper to focus on To better introduce the NSFQ algorithm and demonstrate the effec­
research and improvement. tiveness and superiority of the proposed algorithm, the remaining sec­
The literature on improving the Q-Learning algorithm can be tions of this paper are organized as follows. In Section 2, the model of the
broadly divided into the following three categories. Firstly, many USV and the shortcomings of the classical Q-Learning algorithm are
studies have improved the basic elements of Q-Learning algorithms, outlined. In Section 3, the specific content of the proposed NSFQ algo­
such as reward function (Marthi, 2007), action space (Bianchi et al., rithm is introduced separately. In Section 4, the comparative experi­
2008), value function Q (Low et al., 2019), etc., to improve the ments and simulation results of USV path planning in different
learning and exploration efficiency of the Q-Learning algorithm. simulation environments are presented. In Section 5, conclusions are
However, the above improvements did not significantly improve the drawn based on simulation experiments and results.
convergence speed or learning time of the algorithm. Secondly, prior
knowledge can be introduced to provide the Q-Learning algorithm 2. Preliminaries
with additional information to improve convergence speed, so many
scholars have also utilized prior knowledge to improve the Q-Learning In this section, the model of the USV is first introduced, then the Q-
algorithm (Hao et al., 2023). Yang et al. (2022) proposed a global path Learning algorithm is summarized, and some problems with the Q-
planning algorithm based on the double deep Q network (DDQN), Learning algorithm are outlined in the USV path planning. It makes
while utilizing prior knowledge to select the action space, and proving preparations for the NSFQ algorithm proposed in Section 3.
that the generated path has better performance. In addition, neural
networks can be incorporated into the Q-Learning algorithm to
2.1. Model of the USV
enhance its performance and efficiency (Wang et al., 2019b). Wang
(2021) introduced artificial neural networks into the Q-Learning al­
The horizontal kinematics model of the USV can be represented by
gorithm, using a back propagation (BP) neural network to approxi­
Eq. (1).
mate the Q function to solve the global path planning problem of ⎧
automated guided vehicles (AGV). ⎨ ẋ = u cos ψ − v sin ψ
However, in response to the shortcomings of the Q-Learning algo­ ẏ = u sin ψ + v cos ψ (1)

rithm, most of the algorithm improvements and optimizations proposed ψ̇ = r
by the aforementioned scholars are only applicable to the path planning
problems for mobile robots, and not suitable for path planning of the where: x, y are the position of the USV on the horizontal plane in the
USV. The dynamic characteristics, efficiency, and rules of the ship are northeast coordinate system; ψ is the heading angle of the USV; u, v and r
not taken into account by the above algorithm. Moreover, the USV needs are the longitudinal speed, the transverse speed, and the heading
to achieve autonomous driving and avoid the input of human knowledge angular velocity of the USV in the hull coordinate system, respectively.
as much as possible during path planning. The Q-Learning algorithm The kinematic coordinate system of the USV is established, as shown in
does not require any human knowledge or default rules and is charac­ Fig. 1.
terized by strong robustness and adaptability to uncertain environments.
Meanwhile, the path planned by the above algorithm does not meet the 2.2. Classical Q-Learning algorithm
requirements of USV maneuverability and safety, and the planned path
length and smoothness are not ideal. Therefore, in this paper, a neural The Q-Learning algorithm is a classical reinforcement learning al­
network smoothing and fast Q-Learning (NSFQ) algorithm is proposed to gorithm, which is an offline learning method to estimate the state-action
solve the global path planning and obstacle avoidance problems for the value function. Its basic idea is completed by the USV perception of the
USV. The heading angle, angular velocity, length of the USV, and other current environmental state st and evaluation of the reward Rt obtained
indicators of the USV, as well as the maneuverability and motion by the possible action at , aiming to maximize the cumulative reward of
characteristics of the USV, are taken into account by this proposed the USV in its interaction with the environment.
algorithm. The reinforcement learning principle can be seen in Fig. 2. st is the
state of the USV at time t, at is the action performed by the USV at time t
1.2. Contributions in the environment. Generally, the action is selected by the ε − greedy
strategy, and ε is the exploration factor. The next state st+1 of the
The NSFQ algorithm proposed in this paper is applied to solve the environment is obtained by the action at , meanwhile, the environment

2
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Fig. 2. Reinforcement learning principle.

increase of the state dimension, thus reducing the computational


efficiency.
3) The classical Q-Learning algorithm uses grid-based modeling, but
Fig. 1. Kinematic coordinate system of the USV.
due to the relatively complex and large-scale marine environment,
this method is not suitable for path planning of the USV. In addition,
generates a new feedback Rt+1 in the next state st+1 . The next action at+1
the design of action space is limited by the state space based on the
of the USV is executed by st+1 and Rt+1 , repeating this process until the
grid method.
end of the iteration.
Q-Learning is an iterative algorithm based on a value function that is
To address the limitations mentioned above, a NSFQ algorithm based
used to explore an unknown environment to find the optimal action. The
on the Q-Learning algorithm is proposed in this paper. The details of this
optimal action is approximated by Q-Learning, which constantly up­
algorithm are discussed in Section 3.
dates the state-action value function Q(st , at ) during iteration. The Q-
values of the Q-Learning algorithm are updated by Eq. (2).
3. The proposed NSFQ algorithm and path smoothing
[ ]
Q(st , at ) ← (1 − α)Q(st , at ) + α Rt + γmaxQ(st+1 , at+1 ) (2)
a∈A In this section, the proposed NSFQ algorithm is elaborated in four
stages. Firstly, the basic elements of the Q-Learning algorithm are
where, α ∈ [0, 1] is the learning rate parameter, Rt is the reward of state s improved. Secondly, the safety threshold ρ is proposed to enlarge the
at time t, and γ ∈ [0, 1] is the discount factor. The learning process of the prohibited area covering obstacles. Thirdly, a third-order Bezier curve is
classic Q-Learning algorithm is shown in Algorithm 1. used to smooth the initial path. Finally, the specific process of the NSFQ
algorithm proposed in this paper is given.
Algorithm 1. Classical Q-Learning algorithm

3.1. The basic elements of the Q-Learning algorithm

In this subsection, the basic elements of the Q-Learning algorithm are


improved, including action space, state space, action value function Q(st ,
at ), and reward function.

3.1.1. Action space


The action of the USV is actually the size of its heading angle during
obstacle avoidance or path planning. When the path of USV is planned
by the Q-Learning algorithm, the state of the USV in real self-navigation
is continuous, and then the observation behavior of the USV needs to be
discretized to obtain the discrete action space. In this paper, the action
space of USV is optimized by considering the heading angle. Obstacles
can be avoided by the USV under the current heading by turning the
rudder at any angle of the action space. The action space is based on the
current heading of USV.
The action space of classical Q-Learning algorithm can be defined by
The main disadvantages of the classical Q-learning algorithm applied Eq. (3):
to USV path planning are as follows.
Action = [1, 2, 3, 4] (3)
1) The construction of the classical Q-Learning algorithm action space
where, 1, 2, 3 and 4 respectively represent the USV four discrete actions:
does not meet the maneuvering characteristics of the USV. The
right, up, left and down. And the classical Q-Learning algorithm only has
heading angle of the USV is limited to only four action spaces 0◦ , ±
these four discrete actions, which are determined by the grid-based
90◦ , 180◦ , thereby leading to excessive turning angles.
modeling.
2) In the complex marine environment, the dimension of the Q-table of
At this time, in this paper, the right side is the default forward di­
the classical Q-Learning algorithm increases exponentially with the
rection of the USV, and its heading angle is 0◦ . The heading angle of the
USV can be obtained by Eq. (4):

3
Y. Wang et al. Ocean Engineering 292 (2024) 116510

ψ A = [0◦ , 90◦ , 180◦ , − 90◦ ] (4) 3.1.3. The action value function Q(st , at )
The Q-table is used by the classical Q-Learning algorithm to describe
To shorten the path length of the USV, and its heading angle control
the state-action value function Q(st , at ), and to realize the learning and
and navigation safety are taken into account, the action space of the Q-
storage of the Q-table (Wei and Jin, 2019). However, the working
Learning algorithm is improved. The improved action space increases
environment of the USV is relatively complex, the number of learning
the search behavior in the diagonal direction, so the action space of USV
parameters increases exponentially with the increase of the state
is increased to 8 discrete actions.
dimension, that is, a large amount of memory space is occupied by the
The improved action space of USV can be defined by Eq. (5):
Q-table, which reduces the computational efficiency and causes
Action = [1, 2, 3, 4, 5, 6, 7, 8] (5) “dimension disaster” (Wang, 2021).
In this paper, the RBF neural network is used instead of the Q-table. It
where, 1, 2, 3, 4, 5, 6, 7 and 8 respectively represent USV forward, turn is used to approximate the action value function Q(st , at ) of the Q-
left 45◦ , turn left 90◦ , turn left 135◦ , backward, turn right 135◦ , turn Learning algorithm. The RBF neural network has strong function
right 90◦ , and turn right 45◦ . approximation ability, making it faster and more efficient to approxi­
At this time, the heading angle of the USV can be obtained by Eq. (6): mate the real Q function. As a result, the RBF neural network does not
exist the problem of catastrophic forgetting. The network structure of
ψ A = [0◦ , 45◦ , 90◦ , 135◦ , 180◦ , − 135◦ , − 90◦ , − 45◦ ] (6)
RBF-based Q-Learning algorithm is shown in Fig. 4.
The schematic diagram before and after the improvement of the The RBF neural network is a three-layer static feed-forward network,
action space is compared in Fig. 3. The left figure represents the action including an input layer, a hidden layer, and an output layer.
space of the Q-Learning algorithm, and the right figure represents the The first layer is the input layer, which is composed of signal source
improved action space. nodes. Its input is composed of state variable (s1 , s2 , ...,sN ) and an action
variable a. The input number is M = N+1, and the input vector is:
3.1.2. State space
X = [s1 , s2 , ...,sN , a] (9)
The classical Q-Learning algorithm uses grid-based modeling, so the
size of its state space can be determined by the number of grids. How­ The second layer is the hidden layer, and φk (X), (k= 1, 2, ...,M) is the
ever, in this paper, the grid-based modeling method is not employed in basis function. Generally, the M-dimensional Gaussian function is
the NSFQ algorithm. Therefore, the state space of the algorithm must be selected, and the expression of the k-th RBF node is derived from Eq.
reconstructed. (10):
During the navigation of the USV, the longitudinal peed of the USV is ( )
∑M
(sk − μk )2
u, and the transverse speed v = 0. Therefore, if the starting point coor­ φk (X) = exp − (10)
dinate position (x(0), y(0)), the ending point coordinate position (x(e), k=1
2σ 2k
y(e)) and the heading angle ψ at a certain time are known, the position of
the USV at any time can be obtained. At that time, the position of the where, σ k is the variance of the Gaussian excitation function of the k-th
USV can be converted to Eq. (1): hidden neuron, μk is the Gaussian function clustering center of the k-th
⎧ ∫ t+1 hidden layer node. Assuming the target point coordinate is (xe , ye ), and

⎪ x(t+1) = x(t) + u cos ψ dt the current position coordinate of the USV is (xk , yk ). Their expressions


⎨ t
∫ t+1 can be obtained by Eq. (11) ~ Eq. (13).
(7) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅

⎪ y(t+1) = y(t) + u sin ψ dt


⎩ t Distance = (xk − xe )2 + (yk − ye )2 (11)
r = ψ̇
Distancemax
The position coordinates of the USV at each moment can be calcu­ σk = √̅̅̅̅̅̅ (12)
lated by Eq. (7). The state space of the Q-Learning algorithm is 2N
composed of the position coordinates of USV during navigation. The
heading angle at each moment is taken into account by the NSFQ al­ 1 ∑M
μk = sk (13)
gorithm, so its state space is composed of the position coordinates and M k=1
heading angle of the USV at each moment. The state space is defined as:
The third layer is the output layer, which reacts to the impact of the
S = [x, y, ψ ] (8) input mode. The action value function Q(st , at ) of the Q-Learning algo­
rithm is approximated by only one output node. The expression of the
where, P = [x, y] is the position coordinates of the USV, ψ is the heading
angle of the USV.

Fig. 3. Schematic diagram of action space before and after improvement. Fig. 4. Approximation of action value function Q based on RBF neural network.

4
Y. Wang et al. Ocean Engineering 292 (2024) 116510

action value function Q(st , at ) can be obtained by Eq. (14):



t
Q(st , at ) = ωk φk (s1 , s2 , ...,st , a) (14)
k=1

where, ωk is the weight value, the size of ωk can be represented as fol­


lows:
1
ωk = (15)
Distance

3.1.4. Reward function


In reinforcement learning, the reward function plays an important
role in the behavior decision-making and obstacle avoidance of the USV
(Jin et al., 2018a). Since the environment state set and the action state of
the USV in the Q-Learning algorithm are limited, and the USV is
continuous and systematic during the actual navigation, so the reward
function is generalized into a nonlinear segmentation function, as shown
in Eq. (16).

⎨ − 50 s ∈ RObstacle
R = 100 s ∈ RGoal (16)

− 20 + μ⋅Ls→Goal s ∈ RSafe

where, RObstacle is the obstacle area, RSafe is the safe area, and RGoal is the Fig. 5. The circular prohibited area with safety threshold ρ.
target point. μ is a very small positive number, and its size should be
determined by the planned path length. Ls→Goal is the distance from the 3.2.2. Quadrilateral prohibited area
current position to the target point. It can be seen from Eq. (16) that the Quadrilateral envelope surfaces can be used to simplify the
reward value of the safe area near the target point is relatively large, complexity of modeling for relatively large irregular obstacles. Fig. 6
while the reward value of the safe area far from the target point is shows the quadrilateral prohibited area before and after the safety
relatively small. threshold ρ is introduced. The quadrilateral prohibited area without
introducing the safety threshold ρ is represented by the solid line
3.2. The safety threshold quadrilateral, while the quadrilateral prohibited area after introducing
the safety threshold ρ is represented by the dashed line quadrilateral.
To ensure the safety of the USV, a safety threshold ρ is proposed that The solid line quadrilateral is formed by enclosing four intersecting
is twice the length of the USV. The envelope line with an increased safety lines l1 , l2 , l3 and l4 . The expression of four intersecting lines can be
threshold will be used as the prohibited area to cover obstacles. The expressed as:
prohibited area is the area covered by obstacles, which is the area where 4

the navigation of the USV is prohibited. In marine environments, most Π li : [yi (t) − ki x(t) − bi ]= 0 (21)
i=1
obstacles are irregularly shaped, which can greatly complicate the
environmental modeling process. To simplify the processing, circular where, i= 1, 2, 3, 4 is the serial number of the quadrilateral straight line,
and quadrilateral envelope surfaces are used to envelop these obstacles. k1 , k2 , k3 and k4 are the slopes of straight lines l1 , l2 , l3 and l4 respec­
tively, and b1 , b2 , b3 and b4 are the intercept of lines l1 , l2 , l3 and l4
3.2.1. Circular prohibited area respectively on the y-axis.
Circular envelope surfaces can be used to simplify the complexity of The quadrilateral prohibited area without introducing the safety
modeling for relatively small irregular obstacles. Fig. 5 shows the cir­ threshold ρ is expressed by Eq. (22):
cular prohibited area before and after the safety threshold ρ is intro­
duced. The circular prohibited area without introducing the safety
threshold ρ is represented by the solid line circle, while the prohibited
area after introducing the safety threshold ρ is represented by the dashed
line circle.
The circular prohibited area without introducing safety threshold ρ is
expressed by Eq. (17):

(x − x0 )2 + (y − y0 )2 ≤ r2 (17)
The circular prohibited area after introducing the safety threshold ρ
is expressed by Eq. (20) as follows:
R=r + ρ (18)

ρ= 2⋅LUSV (19)

(x − x0 )2 + (y − y0 )2 ≤ R2 (20)

where, r is the radius of the solid line circle without introducing the
safety threshold ρ, R is the radius of the dashed line circle with intro­
ducing the safety threshold ρ, (x0 , y0 ) is the center of the prohibited area,
and LUSV is the length of the USV. Fig. 6. The quadrilateral prohibited area with safety threshold ρ.

5
Y. Wang et al. Ocean Engineering 292 (2024) 116510



⎪ y(t) − y1 (t)≥ 0 requirements of the path are taken into account, and the third-order

y(t) − y2 (t)≤ 0 Bezier curve is used to smooth the initial path in this paper.
(22)
⎪ y(t) −
⎪ y3 (t)≥ 0 The equation for each point on a third-order Bezier curve is defined

y(t) − y4 (t)≤ 0 as follows:
The dashed line quadrilateral introducing the safety threshold ρ is B(t) = (1 − t)3 P00 +3t(1 − t)2 P01 +3t2 (1 − t)P02 + t3 P03 , t ∈ [0, 1]
enclosed by four intersecting lines ln1 , ln2 , ln3 and ln4 . The new four
intersecting lines can be expressed as: where, P00 , P01 , P02 and P03 represent three consecutive initial control
4 points.
Π lni : [yni (t) − ki x(t) − bni ]= 0 (23) The optimization process of the third-order Bezier curve is shown in
i=1
Fig. 7, with four control points selected. The initial control points P01 and
where, bn1 , bn2 , bn3 and bn4 are the intercept of lines ln1 , ln2 , ln3 and ln4 P02 can be smoothed by the third-order Bezier curve. The black line is the
respectively on the y-axis. initial path, and the red curve is the smoothed path of the third-order
Due to the introduction of the safety threshold ρ, the intercept of the Bezier curve.
straight line on the y-axis changes by:
Δb = ρ / cos(arctan ki ) (24) 3.4. USV path planning process based on NSFQ algorithm

The intercept of new straight lines ln1 , ln2 , ln3 and ln4 on the y-axis can The flow chart of USV path planning based on the NSFQ algorithm is
be expressed by Eq. (25): shown in Fig. 8, and the specific steps are as follows.


⎪ bn1 = b1 − Δb

bn2 = b2 + Δb Step 1: Generate the obstacle environment. Randomly generated
(25) obstacle maps are used in USV path planning. Moreover, the starting
⎪ bn3 = b3 − Δb


bn4 = b4 + Δb and ending coordinates are determined.
Step 2: The envelope line of the safety threshold ρ is introduced as the
The quadrilateral prohibited area after introducing the safety prohibited area to cover obstacles.
threshold ρ is expressed by Eq. (26) as follows: Step 3: Initialization. The relevant parameter values of the algorithm

⎪ y(t) − yn1 (t)≥ 0 are set, including learning rate α, discount factor γ, exploration factor


y(t) − yn2 (t)≤ 0 ε, maximum number of iterations N, maximum number of explora­
(26)
⎪ y(t) − yn3 (t)≥ 0
⎪ tion steps per iteration and other related parameter values.

y(t) − yn4 (t)≤ 0 Step 4: The action space of USV can be designed by Eq. (6). The RBF
Neural Network is introduced to approximate the real Q-value, and
When the slope of an edge does not exist, the inequality constraint form the reward function R is designed according to the target point and
of the edge becomes a constraint on x(t). obstacle information by Eq. (16).
Eq. (22) can be transformed into: Step 5: The initial state sstart is composed of the starting point and
initial heading of the USV according to Eq. (7).

4
{ [ ]}
min 0, (− 1)i (x(t) − xi (t)) ≤ 0 (27) Step 6: According to the ε − greedy strategy, the specific action a of
i=1 the USV in state st is selected, the corresponding instant reward R for
Eq. (26) can be transformed into: this action a, and the next state st+1 to be transferred are obtained,
furthermore, the position coordinates and heading of the USV in the

4
{ [ ( )]} next state st+1 are acquired.
min 0, (− 1)i x(t) − xi (t) − (− 1)i ρ ≤ 0 (28)
i=1
Step 7: The RBF neural network is used to approximate the Q-value
by Eq. (14), and update the Q-value by Eq. (2).
where, min( ⋅) is the minimum function, xi (t) is an expression for line li , Step 8: The current state is transitioned into st ←st+1 when the Q-
and the slope of line li does not exist. value update of the current state is completed.
Step 9: The state s is judged. If the state s of the USV is located in the
obstacle area or the target area, proceed to step 5 for a new round of
3.3. Path smoothing learning and increment the iteration number N by 1. If it is located in
a feasible area, proceed to step 6 to continue learning.
The action space is enhanced by incorporating heading information, Step 10: Determine whether the maximum number of iterations has
increasing the number of discrete actions to 8 for the actual search ac­ been reached. If not, it indicates that iterations have not yet ended,
tion of USV. Due to the limitation of action space, the planned initial proceed to step 6. If the maximum number of iterations has been
path is non-smooth, with numerous unnecessary turns and broken line reached, proceed to step 11.
connections. The heading angle of the USV and its continuity in the Step 11: The optimal path strategy π∗ (s) = argmaxQ(s, a) is output
actual navigation are taken into account, in this paper, the Bezier curve until the Q-value converges.
is used to smooth the initial path. Step 12: A third-order Bezier curve is used to smooth the initial path.
Based on Eq. (29), the expressions for the first-order to third-order
Bezier curves are shown in Table 1.
{
P0i i = 0, 1, ..., n − k Table 1
k
Pi = (29) Calculation formula for Bezier curves.
(1 − t)Pk−i 1 + tPk−i+11 i = 0, 1, ..., n − k
Control first-order second-order third-order
point
where, t ∈ [0, 1] is the proportional coefficient, P0i is the i-th initial
control point, Pki is the i-th k-order control point (k = 1, 2, ..., n). P00
Due to the higher order of the Bezier curve, the greater the degree of P01 P10 = (1 − t)P00 + tP01
P02 P11 = (1 − t)P01 + tP02 P20 = (1 − t)P10 + tP11
optimization and the smoother the path. However, at the same time, it is
P03 P12 = (1 − t)P02 + tP03 P21 = (1 − t)P11 + tP12 P30 = (1 − t)P20 + tP21
more prone to collision. The safety of the USV and the smoothness

6
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Table 2
The complexity quantification of the simulation environment.
Simulation environment Env.1 Env.2 Env.3 Env.4

Number of static obstacles 5 8 12 17


Types of obstacles 2 2 2 3
Number of dynamic obstacles 1 2 3 4
Obstacle rate 32.35 % 56.38 % 58.35 % 65.2 %

4. Simulation results

In Section 4.1, the evaluation indicators for evaluating the effec­


tiveness of USV path planning are introduced. In Section 4.2, the specific Fig. 7. Third-order Bezier curves.
simulation environment is introduced. Firstly, in Section 4.3, the
simulation validation is conducted in self-created environments with considered during navigation. In general, the path length is directly
randomly generated obstacle maps. Secondly, in Section 4.4, the simu­ proportional to the economic efficiency. The path length L of the USV
lation validation is conducted in a simulated marine environment. from its starting position (x0 , y0 ) to its target position (xe , ye ) is shown in
In different simulation environments, evaluation indicators such as Eq. (30).
path length, sailing time, heading angle, and angular velocity are e− 1 √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅

recorded. The NSFQ algorithm and other comparison algorithms are L= (yt+1 − yt )2 + (xt+1 − xt )2 (30)
compared by analyzing the results of evaluation indicators, which prove t=0

the superiority of the proposed algorithm.


where, t = 0, 1, 2,...,e − 1, (xt , yt ) is the position coordinate of the current
state after the USV adopts the Bezier curve, and (xt+1 , yt+1 ) is the posi­
4.1. Evaluation indicators
tion coordinate of the next state after the USV adopts the Bezier curve.

4.1.1. Path length


The economic efficiency of the USV is an important indicator to be

Table 3
Comparison of evaluation indicators for different algorithms.
Statistics A* RRT CQL NSFQ

Env.1 Path Length(m) Mean 1107.5231 1210.0397 1500 1089.7326


Sailing time(s) Mean 138.4404 151.255 187.5 136.2166
Heading angle ψ Mean 45 45.57 44.7606 33.67
(◦ ) Max. angle 90 132.61 90 55.22
Min. angle 0 − 41.59 0 − 40.088
Angle Std. Deviation 19.8777 30.5505 45.0593 27.6145
Number of sharp turns 3 12 5 0
Angular velocity r Max. angular velocity 0.2326 0.1903 0.4652 0.1
(rad/s) Min. angular velocity − 0.2326 − 0.1980 − 0.4652 − 0.1
Max. r variations 0.2326 0.2790 0.4652 0.1354

Env.2 Path Length(m) Mean 898.32 990.5112 1200 891.0061


Sailing time(s) Mean 112.29 123.8139 150 111.3758
Heading angle ψ Mean − 45 − 45.9859 − 45.1495 − 40.6043
(◦ ) Max. angle 0 − 11.8583 0 12.8518
Min. angle − 90 − 127.0361 − 90 − 55.9497
Angle Std. Deviation 22.5 32.2623 45.0747 19.4129
Number of sharp turns 17 12 8 0
Angular velocity r Max. angular velocity 0.4756 0.1796 0.4688 0.1000
(rad/s) Min. angular velocity − 0.4756 − 0.3504 − 0.4688 − 0.0999
Max. r variations 0.9512 0.3685 0.4688 0.1998

Env.3 Path Length(m) Mean 907.1068 987.9645 1200 888.246


Sailing time(s) Mean 113.3884 123.4956 150 111.0307
Heading angle ψ Mean 57.8571 56.6305 44.5515 55.8211
(◦ ) Max. angle 90 128.4816 90 107.0832
Min. angle 45 − 24.1091 0 40.2539
Angle Std. Deviation 20.4019 31.1017 45.0727 13.0583
Number of sharp turns 14 15 7 0
Angular velocity r Max. angular velocity 0.4814 0.1833 0.4688 0.0876
(rad/s) Min. angular velocity − 0.4814 − 0.2308 − 0.4688 − 0.0840
Max. r variations 0.9628 0.2926 0.4688 0.0813

Env.4 Path Length(m) Mean 1028.0256 1133.5998 1400 1020.7495


Sailing time(s) Mean 128.5032 141.67 175 127.5937
Heading angle ψ Mean − 45 − 43.4330 − 45.1282 − 45.0195
(◦ ) Max. angle 0 39.7967 0 − 12.4306
Min. angle − 90 − 97.0189 − 90 − 73.6010
Angle Std. Deviation 18.6113 30.6804 45.0641 11.8519
Number of sharp turns 11 15 13 0
Angular velocity r Max. angular velocity 0.4645 0.1817 0.4688 0.1000
(rad/s) Min. angular velocity − 0.4645 − 0.1884 − 0.4688 − 0.1000
Max. r variations 0.9290 0.2598 0.4688 0.1354

7
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Table 4
Comparison of path length performance for different algorithms.
Environment Path length vs A* Path length vs RRT Path length vs CQL

Env.1 1.606 % 9.942 % 27.351 %


Env.2 0.814 % 10.046 % 25.749 %
Env.3 2.079 % 10.093 % 25.980 %
Env.4 0.708 % 9.955 % 27.089 %
Mean 1.302 % 10.009 % 26.542 %

4.1.2. Heading angle


The variation in the heading angle reflects the heading stability and
the number of rudder turns of the USV during navigation. Due to a third-
order Bezier curve is used to smooth the path, the position coordinates of
the path have been changed, and the heading angle of the USV is no
longer the action space designed in this paper, so the heading angle of
the USV needs to be recalculated according to the position coordinates
fitted by the Bezier curve.
According to the model of the USV, as show in Eq. (31)~ Eq. (33), the
heading angle ψ defined in this paper can be obtained by Eq. (34).
{
ẋ = u cos ψ 0
(31)
ẏ = u sin ψ 0

tan ψ 0 = ẏ/ẋ (32)

yt+1 − yt
ψ 0 = arctan (33)
xt+1 − xt


⎪ ψ0 xt+1 − xt > 0


⎨ π + ψ0 xt+1 − xt < 0 and yt+1 − yt ≥ 0
ψ = − π + ψ0 xt+1 − xt < 0 and yt+1 − yt < 0 (34)



⎪ π /2 xt+1 − xt = 0 and yt+1 − yt > 0

− π /2 xt+1 − xt = 0 and yt+1 − yt < 0

where, t = 0, 1, 2,...,e − 1, (xt , yt ) is the position coordinate of the current


state after the USV adopts the Bezier curve, and (xt+1 , yt+1 ) is the posi­
tion coordinate of the next state after the USV adopts the Bezier curve.

4.1.3. Angular velocity


The angular velocity of the USV should not be too large. Excessive
angular velocity means that the significant change in heading angle of the
USV, and the rudder is too large, and the stability of heading is too poor
during navigation. Therefore, excessive angular velocity can lead to the risk
of USV capsizing. The angular velocity r of the USV is shown in Eq. (35).
r = ψ̇ (35)

where, ψ̇ is the derivative of the heading angle of the USV.

4.1.4. Specific quantified percentages

The specific percentage data in Table 4, Table 5, Table 6, and Table 8


can clearly compare the superiority of the proposed algorithm. Specific
percentage data can be obtained by Eq. (36) ~ Eq. (39).
/
DNSFQ vs A∗ = (DataA∗ − DataNSFQ ) DataA∗ (36)

PDNSFQ vs A∗ = DNSFQ vs A∗ ⋅ 100% (37)


/
DNSFQ vs RRT = (DataRRT − DataNSFQ ) DataRRT (38)

PDNSFQ vs RRT = DNSFQ vs RRT ⋅ 100% (39)

where, DataA∗ , DataRRT , DataNSFQ represent the data of A *, RRT, and


NSFQ algorithms, respectively, which can be obtained from Tables 3 and
7. DNSFQ vs A∗ , DNSFQ vs RRT are the specific data after comparison,
Fig. 8. Flow chart of path planning for USV based on NSFQ algorithm. respectively. PDNSFQ vs A∗ , PDNSFQ vs RRT are the specific percentage data
after comparison, respectively.

8
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Table 5 Table 8
Comparison of heading angle standard deviation for different algorithms. Comparison of key evaluation indicators for different algorithms.
Environment Heading angle Std. Heading angle Std. Heading angle Std. Evaluation indicators VS A* VS RRT
Deviation vs A* Deviation vs RRT Deviation vs CQL
Path Length 3.466 % 7.020 %
Env.1 (− )38.922 % 9.610 % 38.715 % Sailing time 3.466 % 7.020 %
Env.2 13.720 % 39.828 % 56.932 % Angle Std. Deviation 11.778 % 18.093 %
Env.3 35.995 % 58.014 % 71.028 % Max. r variations 73.490 % 71.857 %
Env.4 36.319 % 61.370 % 73.700 %
Mean 11.778 % 42.206 % 60.094 %
This paper considers the dynamic characteristics of the USV and
simplifies the USV with length Ls and width Bs to a circular shape with
Table 6 length Ls as the diameter, as shown in Fig. 9. The size of the safety circle
Comparison of maximum angular velocity r variations for different algorithms. needs to be dynamically adjusted for the USV at different speeds. The μ
value is determined by Eq. (40):
Environment Max. r variations vs Max. r variations vs Max. r variations vs
A* RRT CQL σu
μ= 1+ (40)
Env.1 41.788 % 51.470 % 70.894 % umax
Env.2 78.995 % 45.780 % 57.381 %
Env.3 91.556 % 72.215 % 82.658 % where, u is the speed of the USV, umax is the maximum speed of the USV,
Env.4 85.425 % 47.883 % 71.118 % and σ is the coefficient, which can be adjusted by the specific situation.
Mean 74.441 % 54.337 % 70.513 %

4.3. Simulation results in self-created environments


4.2. Simulation environment

Due to the complexity and large-scale of marine environments, the


The effectiveness of the proposed algorithm is verified by simulation
grid-based modeling is not suitable for path planning in complex marine
experiments. There are three types of static obstacles, which are circular
environments. In this paper, the simulated obstacle environment is
obstacles, square obstacles, and irregular quadrilateral obstacles. Cir­
created by randomly generating obstacle maps.
cular obstacles are primarily designed to simplify the complexity of
In order to verify the effectiveness and superiority of the NSFQ al­
modeling for relatively small irregular obstacles. Quadrilateral obstacles
gorithm, the algorithm is compared with the RRT and A* algorithms in
are mainly designed to simplify the complexity of modeling for rela­
the same simulation environment in this paper. In path planning,
tively large irregular obstacles. Dynamic obstacles are mainly composed
various improved RRT and A * algorithms have been proposed, it is
of square obstacles and irregular quadrilateral obstacles.
difficult for this paper to introduce them one by one, only the original
Obstacle rate refers to the ratio of obstacles in the map, and the
RRT algorithm (Zhao et al., 2022) and the improved A * algorithm with
complexity of the simulation environment can be represented by
8 discrete actions (Wang et al., 2019c) are adopted in this research.
obstacle rate. Simulation environment 1, 2, 3, and 4 (Env.1-Env.4)
The path planning simulation diagrams using different methods for
simulate the complex marine environment of the USV during navigation,
the USV in different simulation environments are shown in Fig. 10.
respectively, and the simulation environment growing increasingly
Fig. 10 (a)~(d) represent simulation environment 1, 2, 3, and 4 (Env.1-
complicated. The complexity quantification of the simulation environ­
Env.4), respectively, which simulate the complex marine environment
ment is shown in Table 2.
of the USV during navigation, and the simulation environment growing
increasingly complicated. In Fig. 10, the blue pentagon represents the
Table 7
Comparison of evaluation indicators for different algorithms in marine simula­
tion environment.
Environment Statistics A* RRT NSFQ

marine Path Mean 107.7817 111.9018 104.0463


Length
(km)
environment Sailing Mean 3.7424 3.8855 3.6127
time
(hour)
Heading Mean − 45 − 43.9329 − 40.6431
angle ψ
(◦ ) Max. angle 0 36.1083 42.5650
Min. angle − 90 − 131.5574 − 70.0487
Angle Std. 26.8130 28.8804 23.6550
Deviation
Number of 16 33 0
sharp
turns
Num. of 16 25 0
over 45◦
turns
Angular Max. 0.2956 0.4283 0.1000
velocity r angular
velocity
(rad/s) Min. − 0.2956 − 0.5114 − 0.1000
angular
velocity
Max. r 0.5911 0.5568 0.1567
variation
Fig. 9. Simplified model of the USV dimensions.

9
Y. Wang et al. Ocean Engineering 292 (2024) 116510

starting point of the USV, and the red hexagon represents the ending the turning performance and maneuvering characteristics of USV during
point of the USV. The white area is the feasible area; the black area is the navigation, as shown in Fig. 10 (a)~(d). Compared with the RRT and A*
obstacle area, which is the prohibited area, simulating obstacles that algorithms, the path planned by the NSFQ algorithm is smoother, with
USV may encounter in the marine environment, such as islands, rivers, better path planning results. The turning performance, maneuvering
and ports. The dashed envelope area is the prohibited area after characteristics, and heading stability of the USV during navigation and
considering the safety threshold ρ. obstacle avoidance are taken into account by the algorithm.
In Fig. 10, the red curve represents the path trajectory of the USV In order to comprehensively verify the performance of the proposed
using the NSFQ algorithm, the blue broken line represents the path algorithm, the evaluation indicators introduced in Section 4.1 are used
trajectory of the USV using the RRT algorithm, the green broken line to evaluate the algorithm. The evaluation indicators are path length,
represents the path trajectory of the USV using the A* algorithm, the sailing time, heading angle, and angular velocity, as shown in Table 3.
purple broken line represents the path trajectory of the USV using the Combining the data in Table 3 with Fig. 10, it can be concluded that
classical Q-Learning (CQL) algorithm. In Fig. 10 (a)~(c), there are more the proposed algorithm outperforms the A* and RRT algorithms in terms
obstacles at the starting point, in Fig. 10 (b), there are more obstacles at of path length, as shown in Table 4. Compared with the A*, RRT and CQL
the ending point, and in Fig. 10 (d), there are more obstacles at both the algorithms, the mean path length of the proposed algorithm is reduced
starting and ending points, which simulates the situation of the USV by 1.302 %, 26.542 % and 10.009 % in the four different simulation
entering and leaving ports. environments, respectively. Due to the shorter path length planned, less
In the four different simulation environments of Fig. 10, the NSFQ energy was consumed by the USV and the shorter sailing time was used.
algorithm can effectively solve the global path planning and obstacle The variation curves of the heading angle ψ of the USV using
avoidance problems of USV. The A * algorithm uses grid-based different methods in different simulation environments are shown in
modeling, so the randomly generated obstacle map needs to be Fig. 12. The heading angle of the USV using the A* and RRT algorithms
divided into several grids, and the feasible area and prohibited area need has too many turning angles, which may undergo sudden changes. Due
to be identified. Therefore, after the simulation environment is con­ to the high speed and inertia of the USV, its turning is not flexible, and
verted into a grid map, the problem of multiple continuous sharp turns the change of its heading angle should be smooth to maintain the sta­
may occur during path planning of the USV, which does not meet the bility of the USV heading. In Fig. 12, the change curve of the USV
maneuvering characteristics of the USV in actual navigation, as shown in heading angle using the NSFQ algorithm is relatively smooth, and the
Fig. 11. heading is stable.
The RRT algorithm does not require specific modeling of the simu­ The comparison of the heading angle standard deviation for different
lation environment during path planning, but the feasible path planned algorithms is shown in Table 5. It can be concluded that the proposed
by the RRT algorithm is not relatively optimal, and the path length is algorithm outperforms the A* and RRT algorithms in terms of the
relatively long. In addition, the path planned by the RRT algorithm is heading angle standard deviation. Compared with the A*, RRT and CQL
winding and has too many unnecessary turns, which also does not meet algorithms, the mean heading angle standard deviation of the proposed

Fig. 10. Path planning of USV using different methods in different simulation environments.

10
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Fig. 11. Partial enlarged view using the A* algorithm in Env.3.

algorithm is reduced by 11.778 %, 42.206 % and 60.094 % in the four be too large, as an excessive angular velocity indicates a significant
different simulation environments, respectively. It is worth noting that change in its heading angle, and the USV has large rudder turns and poor
the A* algorithm has the smallest heading angle standard deviation in heading stability during navigation, which can lead to the risk of
Env.1, which is better than the NSFQ algorithm. However, as the capsizing.
simulation environments become increasingly complex, the superiority The comparison of the maximum angular velocity variations for
of the NSFQ algorithm becomes more and more apparent. The NSFQ different algorithms is shown in Table 6. It can be concluded that the
algorithm has the smallest heading angle standard deviation in Env.2, proposed algorithm outperforms the A* and RRT algorithms in terms of
Env.3 and Env.4. the maximum angular velocity variation. Compared with the A*, RRT
The variation curves of the angular velocity r of the USV using and CQL algorithms, the mean maximum angular velocity variation of
different methods in different simulation environments are shown in the proposed algorithm is reduced by 74.441 %, 54.337 % and 70.513 %
Fig. 13. The magnitude of the angular velocity of the USV represents the in the four different simulation environments, respectively.
change in its heading angle. The angular velocity of the USV should not However, in practice, the environment is not completely known; it is

Fig. 12. The variation of the heading angle of the USV.

11
Y. Wang et al. Ocean Engineering 292 (2024) 116510

partially known or completely unknown. In dynamic and uncertain values in the image into two possible values: 0 (white, representing the
marine environments, it is difficult or even impossible to obtain the feasible area) and 1 (black, representing the prohibited area), which can
information of various obstacles before path planning (Cheng et al., make it easier for computers to process and parse map data, thereby
2021). This requires USV to be equipped with sensors (radar, lidar, improving the efficiency of path planning algorithms.
electro-optical (EO) camera, infrared (IR) camera systems, etc.) to The path planning simulation diagrams using different methods for
obtain information on the position, heading and speed of the sur­ the USV in the marine simulation environment are shown in Fig. 15. The
rounding vessels (Han et al., 2020). In view of the unknown obstacle variation of heading angle and angular velocity of the USV using
environment, static obstacles and dynamic obstacles around the USV are different methods in the marine environment is shown in Fig. 16 (a). The
detected by sensors, the map is updated in real time, and real-time on­ variation of heading angle and angular velocity of the USV using
line planning is turned to avoid encountering obstacles. different methods in the marine environment is shown in Fig. 17.
The obstacles on the water surface of the USV are not only static, but Compared with A* and RRT algorithm, the NSFQ algorithm has a
also some movable obstacles, such as other sailing ships. It is assumed smoother path and better path planning effectiveness. The specific
that the direction and speed of dynamic obstacles are known. The results evaluation indicators are shown in Table 7.
of the proposed method are shown in Fig. 14. The dynamic characteristics of the USV are influenced by wind,
In summary, the USV can effectively avoid obstacles in both static waves, and currents. The wind and waves basically do not need to be
and dynamic obstacle environments. The NSFQ algorithm proposed in considered, the controller of the ship itself can achieve its heading un­
this paper can effectively solve the global path planning and obstacle affected. The currents also have less impact on the USV, which is dis­
avoidance problems of the USV. The proposed algorithm is superior to A cussed in this paper. A flow field simulation model with a known
* and RRT algorithms in evaluation indicators such as path length, direction parallel to the x-axis and a speed of 1 m/s, as is shown in
sailing time, heading angle, and angular velocity, thereby proving the Fig. 16 (b).
effectiveness and superiority of the NSFQ algorithm. The comparison of key evaluation indicators for different algorithms
is shown in Table 8. It can be concluded that compared with the A * and
4.4. Simulation results in marine environment RRT algorithms, the path length of the NSFQ algorithm is reduced by
3.466 % and 7.020 % in the marine simulation environment, respec­
Finally, in order to verify the feasibility and efficiency of the NSFQ tively. The sailing time is reduced by 3.466 % and 7.020 % respectively.
algorithm, a map of a certain sea area is selected as the marine simu­ The heading angle standard deviation is reduced by 11.778 % and
lation environment for path planning, as shown in Fig. 14 (a). The sea 18.093 % respectively. The maximum angular velocity variation is
area is dominated by islands, which have narrow waterways and reduced by 73.490 % and 71.857 % respectively. The effectiveness and
complicated paths, and the obstacles are scattered and numerous. superiority of the NSFQ algorithm are proved.
During the path planning process, this paper performed image In the marine simulation environment, the NSFQ algorithm proposed
binarization on the map to simplify the computational complexity of in this paper is superior to the A * and RRT algorithms in evaluation
computers in processing images and reduce storage requirements, as indicators such as path length, sailing time, heading angle, and angular
shown in Fig. 14 (b). Image binarization refers to converting the pixel velocity, thereby proving the effectiveness and superiority of the NSFQ

Fig. 13. The variation of the angle velocity of the USV.

12
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Fig. 14. Path planning of USV using different methods in dynamic simulation environments.

Fig. 15. Marine simulation environment for path planning.

algorithm. However, the NSFQ algorithm still has limitations. Its limi­ position by using the model of the USV. The initial heading angle of the
tation is that the computation time is longer than other traditional USV can be obtained from action space. The position and initial heading
algorithms. angle of the USV at each moment are reconstructed into state space.
Meanwhile, the position and heading angle of the USV are also funda­
5. Conclusion mental factors in the action space and reward function. Moreover, the
safety threshold is introduced to ensure the safety of the USV, which is
In this paper, a NSFQ algorithm is proposed for the global path twice the length of the USV. A third-order Bezier curve is used to smooth
planning and obstacle avoidance problems of the USV. In the proposed the initial path so that the USV can maintain its heading stability during
algorithm, the RBF neural network is used instead of the Q-table, which navigation and obstacle avoidance.
makes it faster and more efficient to approximate the Q function. The Finally, to verify the effectiveness and superiority of the NSFQ al­
next position of the USV can be inferred according to the current gorithm, different simulation environments are established, which are

13
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Fig. 16. Path planning of the USV using different methods in marine environment.

Fig. 17. The variation of heading angle and angular velocity of the USV.

the randomly generated obstacle maps and the simulation map of the Declaration of competing interest
marine environment. The data analysis results show that the path
length, sailing time, heading angle, angular velocity, and smoothness The authors declare that they have no known competing financial
obtained by the proposed algorithm are significantly better than the A* interests or personal relationships that could have appeared to influence
and RRT algorithms. In this paper, it is necessary to set the heading the work reported in this paper.
speed in advance, and the speed cannot be adjusted in real-time during
navigation, which is the disadvantage of this paper. Further research is Data availability
needed in the future. Secondly, all obstacles in this paper are known, and
further research is needed on how to avoid obstacles in unknown ob­ The authors do not have permission to share data.
stacles using the proposed algorithm.
References
CRediT authorship contribution statement
Bianchi, R., Ribeiro, C., Costa, A., 2008. Accelerating autonomous learning by using
heuristic selection of actions. J. Heuristics 14 (2), 135–168.
Yuanhui Wang: Supervision, Validation, Conceptualization, Formal
Chen, C., Chen, X.Q., Ma, F., Zeng, X.J., Wang, J., 2019. A knowledge-free path planning
analysis, Funding acquisition, Resources. Changzhou Lu: Conceptuali­ approach for smart ships based on reinforcement learning. Ocean Eng. 189, 9.
zation, Software, Validation, Writing – original draft, Writing – review & Cheng, C., Sha, Q., He, B., Li, G.J.O.E., 2021. Path planning and obstacle avoidance for
editing. Peng Wu: Supervision, Validation, Project administration. AUV: a review. Ocean Eng. 235, 109355.
Duan, J.M., Chen, Q.L., 2019. Prior knowledge based Q-learning path planning
Xiaoyue Zhang: Data curation, Project administration, Validation. algorithm. Electron. Opt. Control 26 (9), 29–33.
Fu, J. J., Khan, F., 2020. Monitoring and modeling of environmental load considering
dependence and its impact on the failure probability. Ocean Eng. 199.
Han, J., Cho, Y., Kim, J., Kim, J., Son, N.s., 2020. Autonomous collision detection and
avoidance for ARAGON. USV: Development and field tests 37 (6), 987–1002.

14
Y. Wang et al. Ocean Engineering 292 (2024) 116510

Hao, B., Du, H., Yan, Z.P., 2023. A path planning approach for unmanned surface Singh, Y., Sharma, S., Sutton, R., Hatton, D., Khan, A., 2018. A constrained A* approach
vehicles based on dynamic and fast Q-learning. Ocean Eng. 270. towards optimal path planning for an unmanned surface vehicle in a maritime
Hart, P.E., Nilsson, N.J., Raphael, B., 1968. A formal basis for the heuristic determination environment containing dynamic obstacles and ocean currents. Ocean Eng. 169,
of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4 (2), 100–107. 187–201.
Jin, J., Zhang, J., Shao, F., Lyu, Z., 2018a. A novel ocean bathymetry technology based Sun, Y.S., Wang, L.F., Wu, J., Ran, X.R., 2020. A general overview of path planning
on an unmanned surface vehicle. Acta Oceanol. . 37 (9), 99–106. methods for autonomous underwater vehicle. Ship Science and Technology 42 (7),
Jin, J., Zhang, J., Shao, F., Zhichao, L., Wang, D., 2018b. A novel ocean bathymetry 1–7.
technology based on an unmanned surface vehicle. Acta Oceanol. Sin. 37 (9), Wang, C.B., Zhang, X.Y., Zou, Z.Q., Wang, S.B., 2018. On path planning of unmanned
99–106. ship based on Q-learning. Ship&Ocean Engineering 47 (5), 168–171.
Lan, W., Jin, X., Chang, X., Wang, T.L., Zhou, H., Tian, W., Zhou, L.L., 2022. Path Wang, D.X., 2021. AGV path planning based on improved Q⁃learning algorithm.
planning for underwater gliders in time-varying ocean current using deep Electronic Design Engineering 29 (4), 7–10+15.
reinforcement learning. Ocean Eng. 262. Wang, H., Mao, W., Eriksson, L., 2019a. A Three-Dimensional Dijkstra’s algorithm for
Li, X.W., Gao, M., Kang, Z., Sun, H.X., Liu, Y.C., Yao, C.Y., Zhang, A.M., 2023. multi-objective ship voyage optimization. Ocean Eng. 186.
Collaborative search and rescue based on swarm of H-MASSs using consensus theory. Wang, J., Zhang, P.L., Zhao, Z.Y., Cheng, X.P., 2019b. Path planning method based on
Ocean Eng. 278. neural network and Q(λ)-learning. Automation and Instrumentation 34 (9), 1–4.
Low, E.S., Ong, P., Cheah, K.C., 2019. Solving the optimal path planning of a mobile Wang, Z.Y., Zeng, G.H., Huang, B., Fang, Z.J., 2019c. Global optimal path planning for
robot using improved Q-learning. Robot. Autonom. Syst. 115, 143–161. robots with improved A* algorithm. J. Comput. Appl. 39 (9), 2517–2522.
Mac, T.T., Copot, C., Tran, D.T., Keyser, R.D., 2016. Heuristic approaches in robot path Watkins, C., Dayan, P., 1992. Q-learning. Mach. Learn. 8 (3–4), 279–292.
planning: a survey. Robot. Autonom. Syst. 86, 13–28. Wei, Y.L., Jin, W.Y., 2019. Intelligent Vehicle Path Planning Based onNeural Network Q-
Marthi, Bhaskara, 2007. Automatic shaping and decomposition of reward functions. learning Algorithm. Fire Contr. Command Contr. 44 (02), 46–49.
Proceedings of the 24th International Conference on Machine Learning. ACM, USA, Xin, J.F., Zhong, J.B., Yang, F.R., Cui, Y., Sheng, J.L., 2019. An improved genetic
pp. 601–608. algorithm for path-planning of unmanned surface vehicle. Sensors 19 (11).
Ntakolia, C., Lyridis, D.V., 2022. A comparative study on Ant Colony Optimization Xu, X.S., Yuan, J., 2019. Path planning for mobile robot based on improved
algorithm approaches for solving multi-objective path planning problems in case of reinforcement learning algorithm. Journal of Chinese Inertial Technology 27 (3),
unmanned surface vehicles. Ocean Eng. 255. 314–320.
Owen, I., Lee, R., Wall, A., Fernandez, N., 2021. The NATO generic destroyer a shared Yang, X.F., Shi, Y.L., Liu, W., Ye, H., Zhong, W.B., Rong, X.Z., 2022. Global path planning
geometry for collaborative research into modelling and simulation of shipboard algorithm based on double DQN for multi-tasks amphibious unmanned surface
helicopter launch and recovery. Ocean Eng. 228. vehicle. Ocean Eng. 266.
Ozturk, U., Akda, M., Ayabakan, T., 2022. A review of path planning algorithms in Yao, P., Lou, Y.T., Zhang, K.M., 2023. Multi-USV cooperative path planning by window
maritime autonomous surface ships: navigation safety perspective. Ocean Eng. 251. update based self-organizing map and spectral clustering. Ocean Eng. 275.
Sang, H.Q., You, Y.S., Sun, X.J., Zhou, Y., Liu, F., 2021. The hybrid path planning Zhao, C., Zhu, Y.F., Du, Y.C., Liao, F.X., Chan, C.Y., 2022. A novel direct trajectory
algorithm based on improved A* and artificial potential field for unmanned surface planning approach based on generative adversarial networks and rapidly-exploring
vehicle formations. Ocean Eng. 223. random tree. IEEE Trans. Intell. Transport. Syst. 23 (10), 17910–17921.

15

You might also like