摘要
以障碍物随机分布的复杂环境下多无人机攻防对抗机动决策为研究背景,构建了攻防双方运动模型及雷达探测模型,将双延迟深度确定性策略梯度(TD3)算法扩展到多智能体领域中以解决多智能体深度确定性策略梯度(MADDPG)算法存在值函数高估的问题;在此基础上,为了提升算法学习效率,结合优先经验回放机制提出了优先经验回放多智能体双延迟深度确定性策略算法(PER-MATD3)。通过仿真实验表明本文所设计的方法在多无人机攻防对抗机动决策问题中具有较好的对抗效果,并通过对比验证了(PER-MATD3)算法相较其他算法在收敛速度和稳定性方面的优势。
This paper explores multi-UAVs attack-defence confrontation maneuvering decision-making in a complex en⁃vironment with random distribution of obstacles.A motion model and a radar detection model for both attack and de⁃fence sides are constructed.the Twin Delayed Deep Deterministic policy gradient(TD3)algorithm is extended to the multi-agent field to solve the problem of overestimation of the value function in the Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm.To improve the learning efficiency of the algorithm,a Prioritized Experience Replay Multi-Agent Twin Delayed Deep Deterministic policy gradient(PER-MATD3)algorithm is proposed based on the priority experience playback mechanism.The simulation experiments show that the method proposed in this paper has a good confrontation effect in multi-UAV attack-defence confrontation maneuvering decision making,and the ad⁃vantages of the PER-MATD3 algorithm over other algorithms in terms of convergence speed and stability are verified by comparison.
作者
符小卫
徐哲
朱金冬
王楠
FU Xiaowei;XU Zhe;ZHU Jindong;WANG Nan(School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710129,China;Xi’an Institute of Applied Optics,Xi’an 710065,China;AVIC Shenyang Aircraft Design Research Institute,Shenyang 110035,China)
出处
《航空学报》
EI
CAS
CSCD
北大核心
2023年第7期191-204,共14页
Acta Aeronautica et Astronautica Sinica
基金
航空科学基金(2020Z023053001)。
关键词
多无人机
多智能体强化学习
PER-MATD3
攻防对抗
机动决策
multi-UAVs
multi-agent reinforcement learning
PER-MATD3
attack-defence confrontation
maneuvering decision-making