期刊文献+

伴随压制干扰与组网雷达功率分配的深度博弈研究 被引量:4

Deep Game of Escorting Suppressive Jamming and Networked Radar Power Allocation
下载PDF
导出
摘要 传统的组网雷达功率分配一般在干扰模型给定的情况下进行优化,而干扰机资源优化是在雷达功率分配方式给定情况下,这样的研究缺乏博弈和交互。考虑到日益严重的雷达和干扰机相互博弈的作战场景,该文提出了伴随压制干扰下组网雷达功率分配深度博弈问题,其中智能化的目标压制干扰采用深度强化学习(DRL)训练。首先在该问题中干扰机和组网雷达被映射为两个智能体,根据干扰模型和雷达检测模型建立了压制干扰下组网雷达的目标检测模型和最大化目标检测概率优化目标函数。在组网雷达智能体方面,由近端策略优化(PPO)策略网络生成雷达功率分配向量;在干扰机智能体方面,设计了混合策略网络来同时生成波束选择动作和功率分配动作;引入领域知识构建更加有效的奖励函数,目标检测模型、等功率分配策略和贪婪干扰功率分配策略3种领域知识分别用于生成组网雷达智能体和干扰机智能体的导向奖励,从而提高智能体的学习效率和性能。最后采用交替训练方法来学习两个智能体的策略网络参数。实验结果表明;当干扰机采用基于DRL的资源分配策略时,采用基于DRL的组网雷达功率分配在目标检测概率和运行速度两种指标上明显优于基于粒子群的组网雷达功率分配和基于人工鱼群的组网雷达功率分配。 The traditional networked radar power allocation is typically optimized with a given jamming model,while the jammer resource allocation is optimized with a given radar power allocation method;such research lack gaming and interaction.Given the rising seriousness of combat scenarios in which radars and jammers compete,this study suggests a deep game problem of networked radar power allocation under escort suppression jamming,in which intelligent target jamming is trained using Deep Reinforcement Learning(DRL).First,the jammer and the networked radar are mapped as two agents in this problem.Based on the jamming model and the radar detection model,the target detection model of the networked radar under suppressed jamming and the optimized objective function for maximizing the target detection probability are established.In terms of the networked radar agent,the radar power allocation vector is generated by the Proximal Policy Optimization(PPO)policy network.In terms of the jammer agent,a hybrid policy network is designed to simultaneously create beam selection and power allocation actions.Domain knowledge is introduced to construct more effective reward functions.Three kinds of domain knowledge,namely target detection model,equal power allocation strategy,and greedy interference power allocation strategy,are employed to produce guided rewards for the networked radar agent and the jammer agent,respectively.Consequently,the learning efficiency and performance of the agent are improved.Lastly,alternating training is used to learn the policy network parameters of both agents.The experimental results show that when the jammer adopts the DRLbased resource allocation strategy,the DRL-based networked radar power allocation is significantly better than the particle swarm-based and the artificial fish swarm-based networked radar power allocation in both target detection probability and run time metrics.
作者 王跃东 顾以静 梁彦 王增福 张会霞 WANG Yuedong;GU Yijing;LIANG Yan;WANG Zengfu;ZHANG Huixia(School of Automation,Northwestern Polytechnical University,Xi’an 710072,China;Key Laboratory of Information Fusion Technology,Ministry of Education,Xi’an 710072,China)
出处 《雷达学报(中英文)》 EI CSCD 北大核心 2023年第3期642-656,共15页 Journal of Radars
基金 国家自然科学基金(61873205)。
关键词 雷达资源管理 伴随压制干扰 深度强化学习 检测概率 深度博弈 领域知识辅助学习 Radar resource management Escort suppression jamming Deep Reinforcement Learning(DRL) Detection probability Deep game Domain knowledge assisted learning
  • 相关文献

参考文献14

二级参考文献89

共引文献159

同被引文献46

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部