UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

下载PDF

导出

摘要 Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.

作者 ZHAN Guang ZHANG Kun LI Ke PIAO Haiyin

机构地区 School of Electronics and Information Science and Technology on Electro-Optic Control Laboratory

出处《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页 系统工程与电子技术（英文版）

基金 supported by the Key Research and Development Program of Shaanxi (2022GXLH-02-09) the Aeronautical Science Foundation of China (20200051053001) the Natural Science Basic Research Program of Shaanxi (2020JM-147)。

关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] V279 [航空宇航科学与技术—飞行器设计]

引文网络
相关文献

参考文献4

1Fei Yan,Yi-Sha Liu,Ji-Zhong Xiao.Path Planning in Complex 3D Environments Using a Probabilistic Roadmap Method[J].International Journal of Automation and computing,2013,10(6):525-533. 被引量：14
2ZHANG Jiandong,YANG Qiming,SHI Guoqing,LU Yi,WU Yong.UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning[J].Journal of Systems Engineering and Electronics,2021,32(6):1421-1438. 被引量：17
3WAN Kaifang,LI Bo,GAO Xiaoguang,HU Zijian,YANG Zhipeng.A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments[J].Journal of Systems Engineering and Electronics,2021,32(6):1490-1508. 被引量：3
4XU Zhen,ZHANG Enze,CHEN Qingwei.Rotary unmanned aerial vehicles path planning in rough terrain based on multi-objective particle swarm optimization[J].Journal of Systems Engineering and Electronics,2020,31(1):130-141. 被引量：24

二级参考文献50

1李建勋,佟明安,金德琨.协商微分对策理论及其在多机空战分析中的应用[J].系统工程理论与实践,1997,17(6):68-72. 被引量：14
2J. Moore. UAV Fire-fighting System, U.S. Patent 20130134254 A1, May 2013.
3M. B. Wang, A. Chu, L. A. Bush, B. C. Williams. Active detection of drivable surfaces in support of robotic disas- ter relief missions. In Proceedings of Aerospace Conference, IEEE, Big Sky, MT, USA, pp. 1-13, 2013.
4D. Erdos, A. Erdos, S. E. Vatkins. An experimental UAV system for search and rescue challenge. IEEE Aerospace and Electronic Systems Magazine, vol. 28, no. 5, pp. 32-37, 2013.
5S. Q. Zhu, D. W. Wang, C. B. Low. Ground target tracking using UAV with input constraints. Journal of Intelligent g Robotic Systems, vol. 69, no. 1-4, pp. 417-429, 2013.
6Y. G. Fu, M. Y. Ding, C. P. Zhou. Phase angle-encoded and quantum-behaved particle swarm optimization applied to three-dimensional route planning for UAV. IEEE Transac- tions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 42, no. 2, pp. 511-526, 2012.
7O. Hachour. Path planning of autonomous mobile robot. International Journal of Systems Applications, Engineering & Development, vol. 2, no. 4, pp. 178 190, 2008.
8T. Stoyanov, M. Magnusson, H. Andreasson, A. J. Lilien- thal. Path planning in 3D environments using the normal distributions transform. In Proceedings of 2010 IEEE/RSJ InternationM Conference on Intelligent Robots and Sys- tems, IEEE, Taipei, Taiwan, China, pp. 3263 3268, 2010.
9Z. Qi, Z. H. Shao, Y. S. Ping, L. M. Hiot, Y. K. Leong. An improved heuristic algorithm for UAV path planning in 3D environment. In Proceedings of the 2nd International Conference on Intelligent Human-machine System and Cy- bernetics, IEEE, Nanjing, China, pp. 258-261, 2010.
10Z. N. Dong, Z. J. Chen, R. Zhou, R. L. Zhang. A hybrid approach of virtual force and A* search algorithm for UAV path re-planning. In Proceedings of the 6th IEEE Inter- national Conference on Industrial Electronics and Applica- tions, IEEE, Beijing, China, pp. 1140-1145, 2011.

共引文献54

1苗昊春,刘重,王根.协同制导控制技术发展现状及展望[J].前瞻科技,2022(4):40-54. 被引量：2
2黄鲁,周非同.基于路径优化D^*Lite算法的移动机器人路径规划[J].控制与决策,2020,35(4):877-884. 被引量：23
3李宪强,马戎,张伸,侯砚泽,裴毅飞.蚁群算法的改进设计及在航迹规划中的应用[J].航空学报,2020(S02):213-219. 被引量：33
4韩忠华,毕开元,杨丽英,吕哲.室内复杂环境下多旋翼无人机动态路径规划[J].中国惯性技术学报,2019,27(3):366-372. 被引量：12
5B.K. Patle,Ganesh Babu L,Anish Pandey,D.R.K. Parhi,A. Jagadeesh.A review:On path planning strategies for navigation of mobile robot[J].Defence Technology（防务技术）,2019,15(4):582-606. 被引量：85
6郭一聪,刘小雄,章卫国,杨跃.基于改进势场法的无人机三维路径规划方法[J].西北工业大学学报,2020,38(5):977-986. 被引量：43
7晋玉强,陈麒杰,王陶昱.基于状态改变的机器人动态障碍物路径规划算法[J].南京航空航天大学学报,2020,52(6):861-870. 被引量：3
8刘天宇,王翥.一种多样性控制的多目标粒子群算法[J].西安电子科技大学学报,2021,48(3):106-114. 被引量：6
9仲健宁,向国菲,佃松宜.针对包含狭窄通道复杂环境的高效RRT路径规划算法[J].计算机应用研究,2021,38(8):2308-2314. 被引量：15
10顾文斌,陈泽宇,吴亚伟,苑明海.融合栅格地图模型的改进AGV路径规划算法研究[J].计算机技术与发展,2021,31(9):1-6. 被引量：2

1Chenxi Wang,Youtian Du,Yanhao Huang,Yuanlin Chang,Zihao Guo.Hierarchical Task Planning for Power Line Flow Regulation[J].CSEE Journal of Power and Energy Systems,2024,10(1):29-40. 被引量：1
2Jia‑Xin Li,Hui‑Liang Hou,Yue‑Feng Huang,Mao‑Song Cheng,Zhi‑Min Dai.Pulse-shaping method for real-time neutron/gamma discrimination at low sampling rates[J].Nuclear Science and Techniques,2023,34(11):51-63.
3张轩,卢惠民,任君凯,莫新民,肖浩然,张伟杰,杨璇.基于深度强化学习的机械臂动态目标抓取方法[J].兵工自动化,2024,43(6):91-96.
4马力.蜂窝基站中基于迁移强化学习的网络节能方案[J].数字通信世界,2024(6):33-36.
5Hongyu Ding,Yuanze Tang,Qing Wu,Bo Wang,Chunlin Chen,Zhi Wang.Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning[J].IEEE/CAA Journal of Automatica Sinica,2023,10(12):2233-2247.
6叶缪敏,夏寅,李倩.MOTS-c与代谢相关性疾病的关系及作用机理[J].国际内分泌代谢杂志,2024,44(1):27-30.
7陈超旭,周晓萌,李子薇,沈超,张俊文,施剑阳,迟楠.Lens-free wavefront shaping method for a diffuse non-lineof-sight link in visible light communication[J].Chinese Optics Letters,2024,22(2):81-87.
8Qiyue Li,Yadong Zhu,Jinjin Ding,Weitao Li,Wei Sun,Lijian Ding.Deep Reinforcement Learning Based Resource Allocation for Fault Detection with Cloud Edge Collaboration in Smart Grid[J].CSEE Journal of Power and Energy Systems,2024,10(3):1220-1230.
9曾湖洋,徐刚.深度强化学习方法求解梯级水库随机优化问题[J].三峡大学学报（自然科学版）,2024,46(4):1-9.
10冯学炜,文红,唐韬,石伟宏,赵润晖,彭钰琳.基于深度强化学习的通信抗干扰系统[J].通信技术,2024,57(6):563-568.

Journal of Systems Engineering and Electronics

2024年第3期

浏览历史

内容加载中请稍等...

UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

参考文献4

二级参考文献50

共引文献54

相关作者

相关机构

相关主题

浏览历史