期刊文献+

融合动态奖励策略的无人机编队路径规划方法

UAV formation path planning approach incorporating dynamic reward strategy
下载PDF
导出
摘要 针对未知动态环境下无人机(unmanned aerial vehicle,UAV)编队路径规划问题,提出融合动态编队奖励函数的多智能体双延迟深度确定性策略梯度(multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function,MATD3-IDFRF)算法的UAV编队智能决策方案。首先,针对无障碍物环境,拓展稀疏性奖励函数。然后,深入分析UAV编队路径规划中重点关注的动态编队问题,即UAV编队以稳定的结构飞行并根据周围环境微调队形,其本质为每两架UAV间距保持相对稳定,同时也依据外界环境而微调。为此,设计基于每两台UAV之间最佳间距和当前间距的奖励函数,在此基础上提出动态编队奖励函数,并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic,MATD3)算法提出MATD3-IDFRF算法。最后,设计对比实验,在复合障碍物环境中,所提动态编队奖励函数能将算法成功率提升6.8%,将收敛后的奖励平均值提升2.3%,将编队变形率降低97%。 For the unmanned aerial vehicle(UAV)formation path planning problem in unknown dynamic environment,an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function(MATD3-IDFRF)algorithm is proposed.Firstly,the sparsity reward function is extended for the obstacle-free environment.Then,the dynamic formation problem,which is the focus of attention in UAV formation path planning,is analyzed in depth.It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment.The essence of the analysis is that the spacing between each two UAVs remains relatively stable,while it is also fine-tuned by the external environment.A reward function based on the optimal distance and current distance between each pair of UAVs is designed,leading to the proposal of a dynamic formation reward function,and which is then combined with the multi-agent twin delayed deep deterministic(MATD3)algorithm to propose the MATD3-IDFRF algorithm.Finally,comparison experiments are designed,and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%,while improving the converged reward average by 2.3%and reducing the formation deformation rate by 97%in the complex obstacle environment.
作者 唐恒 孙伟 吕磊 贺若飞 吴建军 孙昌浩 孙田野 TANG Heng;SUN Wei;LYU Lei;HE Ruofei;WU Jianjun;SUN Changhao;SUN Tianye(School of Aeros pace Science and Technology,Xidian University,Xi'an 710118,China;The 365th Research Institute,Northwestern Pol ytechnical University,Xi'an 710072,China;Xi'an ASN UAV Technology Co.Ltd,Xi'an 710065,China;Qian Xuesen Laboratory of Space Technology,China Academy of Space Technology,Beijing 100094,China)
出处 《系统工程与电子技术》 EI CSCD 北大核心 2024年第10期3506-3518,共13页 Systems Engineering and Electronics
基金 中国高校产学研创新基金(2021ZYA08004) 西安市科技计划(2022JH-RGZN-0039) 陕西省重点研发计划重点产业创新链项目(2022ZDLGY03-01) 国家自然科学基金(62173330)资助课题。
关键词 强化学习 奖励函数 无人机 动态编队 路径规划 reinforcement learning(RL) reward function unmanned aerial vehicle(UAV) dynamic formation path planning
  • 相关文献

参考文献9

二级参考文献50

共引文献282

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部