摘要
针对拒止环境中多导弹系统易受到恶意干扰而导致弹间链路可用性和传输时效性降低的问题,提出了一种基于多智能体深度确定策略(Multi-agent Deep Deterministic Policy Gradient,MADDPG)的多弹协同抗干扰算法。以多导弹系统的功耗和数据传输时延为约束,建立去中心化部分可观测马尔可夫决策过程(Decentralized Partially Observable Markov Decision Process,Dec-POMDP),所有节点共享一个全局的收益函数。算法采取集中式训练、分布式执行框架,在训练过程中每一个智能体的Critic网络都会收集所有智能体的状态和动作信息;在执行阶段,只由每个智能体的Actor网络根据局部信息做出决策。仿真结果表明,相较于中继转发优先策略和直接转发优先策略,所提算法使导弹智能体能够根据部分可观测状态信息自适应地进行功率分配决策,从而有效提升分布式多导弹系统的协同抗干扰性能。
The multi-missile system in denial environment is vulnerable to malicious jamming which leads to the degradation of link availability and transmission latency.In response,a cooperative multi-missile anti-jamming algorithm based on multi-agent deep deterministic policy gradient(MADDPG)is proposed to address the problem.The decentralized partially observable Markov decision process(Dec-POMDP)is modeled with the power consumption and latency of the multi-missile system as constraints.All nodes share a common global reward function.The algorithm adopts a centralized training and distributed execution framework,specifically,the critic network of each agent collects the state and action information of all agents during the training phase,and in the execution phase,only the actor network of each agent makes decisions based on local observation.The simulation results demonstrate that the proposed approach enables agents to make power allocation decisions adaptively based on partial observation,which effectively improving the cooperative anti-interference performance of distributed multi-missile systems in comparison with the relay forwarding first strategy and direct forwarding first strategy.
作者
王瑞东
王世练
张炜
张政
Wang Ruidong;Wang Shilian;Zhang Wei;Zhang Zheng(College of Electronic Science and Technology,National University of Defense Technology,Changsha 410073,China;Unit 78020,People’s Liberation Army of China,Kunming 650500,China)
出处
《战术导弹技术》
北大核心
2022年第4期187-195,共9页
Tactical Missile Technology