摘要
潜艇水下作战行动中,受到水下弱可观测环境影响,获取的目标信息呈现稀疏特性。机动规避是潜艇水下防御的重要战术方法,现有机动规避参数仿真与优化方法在建模时不可避免引入观测误差,缺乏对态势演变的应对手段,且由于军事专家的稀缺性,获取军事专家标签的战术对抗样本代价十分昂贵。针对上述困难提出一种基于自编码与主动Q学习策略结合的半监督学习智能决策方法。通过引入对比预测编码自编码器,最大化时序输入与上下文间互信息熵,提高对稀疏时序输入的表征能力。将表征输入与主动强化学习任务相结合,降低智能体的标签需求率,提高规避决策时对环境反馈的能力。基于3 a采集的指挥员战法研练复盘数据构建上帝视角、红方视角数据集。实验结果表明:所提算法与不采用稀疏时序自编码器的算法消融实验,在完全信息、红方视角条件下决策精度分别达到98%、78%,而标签需求率仅为4%、44%;相比于经典的时序分类算法决策精度提高了14%、9%,与有监督算法相比在标签需求率降低为原来的24%~44%条件下,决策精度误差与有监督算法仅差1%,说明所提算法在保证决策精度的同时可大幅降低标签需求量,从而为少量样本条件下的军事智能决策提供一种通用的技术框架。
When a submarine defends against the incoming torpedoes,it is subjected to the weakly observable environment under water,and the target information obtained is sparse.The setting of maneuvering parameters is a key part of submarine tactical decision-making.The existing methods for setting the maneuvering parameters inevitably introduce observation errors in modeling,there is lack of a means to respond to the evolution of situation,and due to the scarcity of military experts,and it is very expensive to obtain the flexible tactical confrontation samples of military experts.To solve the above difficulties,an intelligent tactical decision-making method based on the combination of self-coding and active Q-learning strategy is proposed.By introducing a contrasting predictive coding autoencoder,the mutual information entropy between the time series input and the context is maximized,and the representation ability of sparse time series input is improved.The representation input is combined with the active reinforcement learning task to reduce the label demand rate of the agent and improve the environmental feedback ability of parameter setting.The datasets of God perspective and red perspective are constructed based on the data collected in the past three years.Experiments based on this dataset show that the decision accuracies of the proposed method and the model ablation experiment without sparse time series auto-encoder reach 98%and 78%,respectively,while their label demand rates are only 4%and 44%,respectively.Compared with the proposed method and the classical time series classification model,the decision accuracy of the proposed method is improved by 14%and 9%,and the decision accuracy error compared with real human action is only 1%different from that of the supervised model under the condition that the label demand rate is reduced to 24%~44%.It is explained that the proposed model can greatly reduce the label demand while ensuring the decision-making accuracy.
作者
杨静
吴金平
刘剑
王永洁
董汉权
YANG Jing;WU Jinping;LIU Jian;WANG Yongjie;DONG Hanquan(Navy Submarine College,Qingdao 266041,Shandong,China)
出处
《兵工学报》
EI
CAS
CSCD
北大核心
2024年第10期3474-3487,共14页
Acta Armamentarii
关键词
潜艇规避防御
标签稀疏
主动Q学习
自编码
智能决策
submarine evasion defense
sparse labels
active Q-learning
self coding
intelligent decision-making