摘要
针对复杂电磁环境下的跳频抗干扰通信决策问题,提出了一种新的混合深度循环Q网络(MixDRQN)决策算法。该深度决策算法有效集成了双深度Q网络(DoubleDQN)和对决深度Q网络(DuelingDQN)两种决策机理的优点,并在信号处理前端引入长短时记忆(LSTM)层,以增强决策网络对输入频谱瀑布信号的时间相关特征提取能力。研究表明,所提出的混合决策算法通过引入DoubleDQN解决了基于ε-greedy算法导致的Q值估计偏高的问题,同时通过DuelingDQN和前端增加的LSTM层,能有效学习输入频谱瀑布信号的时间相关特征。实验结果显示,所提方法在多种干扰信号下的收敛速度及抗干扰性能均显著提升,收敛速度较已有算法提升8倍以上。
This paper investigates the problem of anti-jamming communications with intelligent frequency hopping in complex electromagnetic environment.Essentially,this paper proposes a new mixed deep recurrent Q-learning network(MixDRQN)for reinforcement learning(RL)of the optimal anti-jamming strategy.The proposed deep RL algorithm effectively combines double deep Q-learning network(DoubleDQN)and dueling deep Q-learning network(DuelingDQN),and further introduces long short-term memory(LSTM)layer for preprocessing the time-sensitive inputs.With the use of DoubleDQN,the proposed RL algorithm solves the problem of Q-value over-estimation caused by ε-greedy algorithm.In the mean time,the use of DuelingDQN and the LSTM layer has been proved to be very efficient for learning the time-correlated feature of inputs.Extensive experimental results show that both the convergence speed and anti-jamming performance are significantly improved,and in particular,the convergence speed of the proposed RL algorithm is more than 8 times higher than that of the existing RL algorithms.
作者
夏重阳
张剑书
吴晓富
靳越
Xia Chongyang;Zhang Jianshu;Wu Xiaofu;Jin Yue(College of Communication and Information Engineering,Nanjing University of Posts and Telecommunication,Nanjing 210003,China;School of Computer Engineering,Nanjing Institute of Engineering,Nanjing 211167,China)
出处
《电子测量技术》
北大核心
2023年第20期50-57,共8页
Electronic Measurement Technology
基金
国家自然科学基金(61771256)项目资助