A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm
Abstract
:1. Introduction
2. Background
2.1. Dubins Path
2.2. Reinforcement Learning
2.2.1. Basics of Reinforcement Learning
2.2.2. Policy-Based RL
3. RL Formulation
3.1. 4D Waypoints Design
3.2. Fly to Waypoints
3.3. Training Optimization
3.3.1. Multi-Layer RL Algorithm
- Position control layer: control the heading angle of aircraft;
- Altitude control layer: control the vertical velocity of aircraft;
- Velocity control layer: control the horizontal velocity of aircraft.
Algorithm 1 Multi-layer RL algorithm. |
|
3.3.2. State Space
3.3.3. Action Space
3.3.4. Termination State
- Running out of time. The agent is trained every 300 steps, and, if the agent is trained for more than 300 steps, time runs out and the environment resets.
- Reaching the goal. If the agent-goal distance is less than 2 km and the delta of the heading angle is lower than 28°, the aircraft is assumed to have reached the goal.
3.3.5. Reward Function Design
4. Numerical Experiment
4.1. Experiment Setup
4.2. Models and Training
4.2.1. Models
4.2.2. Training
5. Analysis of Results
5.1. Training Performance
5.1.1. Without Considering Arrival Time
5.1.2. Considering Arrival Time
5.2. Multi Aircraft Performance
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Dunn, C.; Valasek, J.; Kirkpatrick, K.C. Unmanned Air System Search and Localization Guidance Using Reinforcement Learning; Infotech@ Aerospace: Garden Grove, CA, USA, 2012; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Verba, V.; Merkulov, V.; Rudenko, E. Linear-cubic locally optimal control of linear systems and its application for aircraft guidance. J. Comput. Syst. Sci. Int. 2020, 59, 768–780. [Google Scholar] [CrossRef]
- Ivler, C.M.; Rowe, E.S.; Martin, J.; Lopez, M.J.; Tischler, M.B. System Identification Guidance for Multirotor Aircraft: Dynamic Scaling and Test Techniques. J. Am. Helicopter Soc. 2021, 66, 1–16. [Google Scholar] [CrossRef]
- Kumar, S.R.; Mukherjee, D. Cooperative active aircraft protection guidance using line-of-sight approach. IEEE Trans. Aerosp. Electron. Syst. 2020, 57, 957–967. [Google Scholar] [CrossRef]
- Morani, G.; Di Vito, V.; Corraro, F.; Grevtsov, N.; Dymchenko, A. Automatic Guidance through 4D Waypoints with time and spatial margins. In Proceedings of the AIAA Guidance, Navigation, and Control (GNC) Conference, Boston, MA, USA, 19–22 August 2013; p. 4892. [Google Scholar] [CrossRef]
- Verba, V.; Merkulov, V.; Rudenko, E. Optimization of automatic support systems of air objects based on local quadratic-biquadratic functionals. I. Synthesis of optimum control. J. Comput. Syst. Sci. Int. 2021, 60, 22–27. [Google Scholar] [CrossRef]
- Wang, X.; Van Kampen, E.J.; Chu, Q.; Lu, P. Stability analysis for incremental nonlinear dynamic inversion control. J. Guid. Control. Dyn. 2019, 42, 1116–1129. [Google Scholar] [CrossRef] [Green Version]
- Meng, Y.; Wang, W.; Han, H.; Ban, J. A visual/inertial integrated landing guidance method for UAV landing on the ship. Aerosp. Sci. Technol. 2019, 85, 474–480. [Google Scholar] [CrossRef]
- Ma, L.; Tian, S. A hybrid CNN-LSTM model for aircraft 4D trajectory prediction. IEEE Access 2020, 8, 134668–134680. [Google Scholar] [CrossRef]
- Juntama, P.; Chaimatanan, S.; Alam, S.; Delahaye, D. A Distributed Metaheuristic Approach for Complexity Reduction in Air Traffic for Strategic 4D Trajectory Optimization. In Proceedings of the 2020 International Conference on Artificial Intelligence and Data Analytics for Air Transportation (AIDA-AT), Singapore, 3–4 February 2020; pp. 1–9. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Vonk, B. Exploring Reinforcement Learning Methods for Autonomous Sequencing and Spacing of Aircraft. 2019. Available online: https://repository.tudelft.nl/islandora/object/uuid:2e776b60-cd4e-4268-93e3-3fcc81cd794f (accessed on 6 August 2021).
- Wang, Z.; Li, H.; Wu, H.; Shen, F.; Lu, R. Design of agent training environment for aircraft landing guidance based on deep reinforcement learning. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 8–9 December 2018; Volume 2, pp. 76–79. [Google Scholar] [CrossRef]
- Waldock, A.; Greatwood, C.; Salama, F.; Richardson, T. Learning to perform a perched landing on the ground using deep reinforcement learning. J. Intell. Robot. Syst. 2018, 92, 685–704. [Google Scholar] [CrossRef] [Green Version]
- Dong, Y.; Tang, X.; Yuan, Y. Principled reward shaping for reinforcement learning via lyapunov stability theory. Neurocomputing 2020, 393, 83–90. [Google Scholar] [CrossRef]
- Zou, H.; Ren, T.; Yan, D.; Su, H.; Zhu, J. Learning Task-Distribution Reward Shaping with Meta-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 11210–11218. [Google Scholar]
- du Preez-Wilkinson, N.; Gallagher, M. Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy Spaces. In International Conference on Parallel Problem Solving from Nature; Springer: Berlin, Germany, 2020; pp. 500–514. [Google Scholar] [CrossRef]
- Levy, A.; Konidaris, G.; Platt, R.; Saenko, K. Learning multi-level hierarchies with hindsight. arXiv 2017, arXiv:1712.00948. Available online: https://arxiv.org/abs/1712.00948 (accessed on 6 August 2021).
- Brittain, M.; Wei, P. Autonomous Aircraft Sequencing and Separation with Hierarchical Deep Reinforcement Learning. 2018. Available online: https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=9470&context=etd#page=91 (accessed on 6 August 2021).
- Cruciol, L.L.; de Arruda, A.C., Jr.; Weigang, L.; Li, L.; Crespo, A.M. Reward functions for learning to control in air traffic flow management. Transp. Res. Part C Emerg. Technol. 2013, 35, 141–155. [Google Scholar] [CrossRef]
- Wang, Z.; Li, H.; Wu, Z.; Wu, H. A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int. J. Adv. Robot. Syst. 2021, 18, 1729881421989546. [Google Scholar] [CrossRef]
- Radac, M.B.; Borlea, A.I. Virtual State Feedback Reference Tuning and Value Iteration Reinforcement Learning for Unknown Observable Systems Control. Energies 2021, 14, 1006. [Google Scholar] [CrossRef]
- Tang, H.; Wang, A.; Xue, F.; Yang, J.; Cao, Y. A novel hierarchical soft actor–critic algorithm for multi-logistics robots task allocation. IEEE Access 2021, 9, 42568–42582. [Google Scholar] [CrossRef]
- Li, T.; Yang, D.; Xie, X.; Zhang, H. Event-triggered control of nonlinear discrete-time system with unknown dynamics based on HDP (λ). IEEE Trans. Cybern. 2021. [Google Scholar] [CrossRef] [PubMed]
- Manyam, S.G.; Casbeer, D.; Von Moll, A.L.; Fuchs, Z. Shortest Dubins path to a circle. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019; p. 0919. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Zhou, W.; Fei, M.; Wang, S. 3D Curve Planning Algorithm of Aircraft Under Multiple Constraints. In Recent Featured Applications of Artificial Intelligence Methods; LSMS 2020 and ICSEE 2020 Workshops; Springer: Berlin, Germany, 2020; pp. 236–249. [Google Scholar] [CrossRef]
- Kučerová, K.; Váň, P.; Faigl, J. On finding time-efficient trajectories for fixed-wing aircraft using dubins paths with multiple radii. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 829–831. [Google Scholar] [CrossRef] [Green Version]
- Van Otterlo, M.; Wiering, M. Reinforcement learning and markov decision processes. In Reinforcement Learning; Springer: Berlin, Germany, 2012; pp. 3–42. [Google Scholar] [CrossRef]
- Szepesvári, C.; Littman, M.L. A unified analysis of value-function-based reinforcement-learning algorithms. Neural Comput. 1999, 11, 2017–2060. [Google Scholar] [CrossRef]
- Yu, M.; Sun, S. Policy-based reinforcement learning for time series anomaly detection. Eng. Appl. Artif. Intell. 2020, 95, 103919. [Google Scholar] [CrossRef]
- Brittain, M.; Wei, P. Autonomous separation assurance in an high-density en route sector: A deep multi-agent reinforcement learning approach. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3256–3262. [Google Scholar] [CrossRef]
- Lee, S.; Kil, R.M. A Gaussian potential function network with hierarchically self-organizing learning. Neural Netw. 1991, 4, 207–224. [Google Scholar] [CrossRef]
- Huang, W.H.; Fajen, B.R.; Fink, J.R.; Warren, W.H. Visual navigation and obstacle avoidance using a steering potential function. Robot. Auton. Syst. 2006, 54, 288–299. [Google Scholar] [CrossRef]
- Hoekstra, J.M.; Ellerbroek, J. Bluesky ATC simulator project: An open data and open source approach. In Proceedings of the 7th International Conference on Research in Air Transportation, Philadelphia, PA, USA, 20–24 June 2016; Volume 131, p. 132. [Google Scholar]
- Sun, J.; Hoekstra, J.M.; Ellerbroek, J. OpenAP: An open-source aircraft performance model for air transportation studies and simulations. Aerospace 2020, 7, 104. [Google Scholar] [CrossRef]
- Hara, K.; Saito, D.; Shouno, H. Analysis of function of rectified linear unit used in deep learning. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–8. [Google Scholar] [CrossRef]
- Agostinelli, F.; Hoffman, M.; Sadowski, P.; Baldi, P. Learning activation functions to improve deep neural networks. arXiv 2014, arXiv:1412.6830. Available online: https://arxiv.org/abs/1412.6830 (accessed on 6 August 2021).
- Gao, B.; Pavel, L. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv 2017, arXiv:1704.00805. Available online: https://arxiv.org/abs/1704.00805 (accessed on 6 August 2021).
- Kakade, S.M. On the Sample Complexity of Reinforcement Learning; University of London, University College London: London, UK, 2003. [Google Scholar]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, P.; Zaremba, W. Hindsight experience replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5055–5065. [Google Scholar]
Coefficients | Value |
---|---|
0.02 | |
0.006 | |
0.5 | |
1 | |
0.02 |
Parameters | Value |
---|---|
Airport latitude | |
Airport longitude | |
Airport runway orientation | |
Aircraft type | F-16 |
x range | [103.4°, 104.5°] |
y range | [30.1°, 30.6°] |
z range | [0 m, 1000 m] |
Time interval | 1 s |
Initial horizontal velocity | 500 km/h |
Maximum horizontal velocity | 500 km/h |
Minimum horizontal velocity | 270 km/h |
Maximum climb velocity | 3 m/s |
Minimum descent velocity | m/s |
Maximum turn rate | m/s |
Maximum acceleration | m/s |
Termination | distance < 2 km and delta heading |
angle < 28° or arrival time < 0 |
Parameters | Value |
---|---|
Replay buffer size | 200,000 |
Discount factor | 0.98 |
Learning rate | |
Mini batch size | 1000 |
Algorithm | Maximum Success Rate (%) | Average Computational Time (ms) |
---|---|---|
Multi-layer approach | 17 | 3.4 |
without reward shaping | ||
Multi-layer approach | 95 | 3.5 |
with reward shaping | ||
Not layered approach | 6 | 2.9 |
without reward shaping | ||
Not layered approach | 5 | 2.7 |
with reward shaping |
Algorithm | Maximum Success Rate (%) | Average Computational Time (ms) |
---|---|---|
Multi-layer approach | 15 | 5.5 |
without reward shaping | ||
Multi-layer approach | 92 | 5.8 |
with reward shaping | ||
Not layered approach | 5 | 3.6 |
without reward shaping | ||
Not layered approach | 2 | 3.4 |
with reward shaping |
Born Place | Average Distance (km) | Average Delta Heading Angle (°) |
---|---|---|
Aircraft in the same aerospace | 0.360 | 13.7 |
Aircraft in different aerospaces | 0.598 | 14.7 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zu, W.; Yang, H.; Liu, R.; Ji, Y. A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm. Sensors 2021, 21, 5643. https://doi.org/10.3390/s21165643
Zu W, Yang H, Liu R, Ji Y. A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm. Sensors. 2021; 21(16):5643. https://doi.org/10.3390/s21165643
Chicago/Turabian StyleZu, Wenqiang, Hongyu Yang, Renyu Liu, and Yulong Ji. 2021. "A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm" Sensors 21, no. 16: 5643. https://doi.org/10.3390/s21165643
APA StyleZu, W., Yang, H., Liu, R., & Ji, Y. (2021). A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm. Sensors, 21(16), 5643. https://doi.org/10.3390/s21165643