期刊文献+

强化学习在多阶段装备组合规划问题中的应用 被引量:2

Application of reinforcement learning in multi-period weapon portfolio planning problems
下载PDF
导出
摘要 针对多阶段武器装备组合规划中的选择难、规划难问题,提出基于多目标优化算法以及强化学习技术的混合优化方法。在各个阶段以装备组合效能最大和成本最小为准则,构建单阶段多目标优化模型,并设计基于非支配排序遗传算法的求解算法以生成各阶段的Pareto解,在此基础上建立多阶段的组合优化模型。通过强化学习的Q-Learning方法,在各阶段的Pareto解中采用探索或者利用两种模式,生成各阶段的装备组合,并指导下一阶段的装备选型,从而生成整个周期内的规划方案。通过对比实验分析,验证了所提模型和算法的有效性,能够为多阶段武器装备组合规划提供辅助决策。 Aiming at the difficulties in the choosing and planning in multi-period weapon systems development problems,an optimization simulation approach combining multi-objective optimization algorithm and reinforcement learning technique was proposed.A multi-objective optimization model was built to maximize the capability and minimize the cost of weapon portfolios in each period.Moreover,a solving algorithm based on the non-dominated sorting genetic algorithm-Ⅲwas presented to obtain the Pareto set in each period,based on which an optimization model for multi-period problem was built.The Q-Learning method,one of the reinforcement learning algorithms,searches within the Pareto set using two different ways for the selection of weapon portfolios in each period,whose outcome is used for the selection in the next period and the optimization of the portfolios over the entire planning horizon.An illustrative example was studied to demonstrate the effectiveness of the proposed model and hybrid algorithm,which can support the decision making on the weapons development and planning.
作者 张骁雄 丁松 李明浩 丁鲲 王龙 义余江 ZHANG Xiaoxiong;DING Song;LI Minghao;DING Kun;WANG Long;YI Yujiang(The Sixty-third Research Institute, National University of Defense Technology, Nanjing 210007, China;School of Economics, Zhejiang University of Finance & Economics, Hangzhou 310018, China;College of Systems Engineering, National University of Defense Technology, Changsha 410073, China;Southwest Electronics and Telecommunication Technology Research Institute, Chengdu 610041, China)
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2021年第5期127-136,共10页 Journal of National University of Defense Technology
基金 国家自然科学基金资助项目(71901215,71901191) 国防科技大学校科研计划资助项目(ZK20-46)。
关键词 武器装备 组合规划 非支配排序遗传算法 强化学习 Q-LEARNING weapon portfolio planning non-dominated sorting genetic algorithm-Ⅲ reinforcement learning Q-Learning
  • 相关文献

参考文献8

二级参考文献49

  • 1王振宇,马亚平,李柯.现代战争复杂性—联合作战的“联合增效”作用研究[J].计算机仿真,2004,21(11):10-12. 被引量:12
  • 2胡晓峰.战争复杂性与信息化战争模拟[J].系统仿真学报,2006,18(12):3572-3580. 被引量:22
  • 3付东,方程,王震雷.作战能力与作战效能评估方法研究[J].军事运筹与系统工程,2006,20(4):35-39. 被引量:69
  • 4Paul K D, Russell D S,Justin B. Portfolio-Analysis Methods for Assessing Capability Options [R]. U. S.A. :RAND Corporation, 2008, MG-662.
  • 5Paul D, Paul K D. A Portfolio-Analysis Tool for Missile Defense [ R ]. U. S. A... RAND Corporation, 2005, TR-262-A.
  • 6Edwin J E,Martin J G, Stephen J B,et al. Modern Portfolio Theory and Investment Analysis [M]. U. S.A. :John Wiley & Sons, Inc. , 2007.
  • 7Jack A J, Brian L J, Lee J L. an Operational Analysis for Air Force 2025, An Application of Value-Focused Thinking to Future Air and Space Capabilities[M]. U.S.A.: Air University, 1995.
  • 8Sean Della.Applying the Information Age Combat Model:Quantitative Analysis of Network Centric Operations. The international C2 journal . 2009
  • 9Yang Liu,Xiao-feng Hu,Lin Wu.Evolutionary Analysis of Operation System-of-Systems (SoS)Network Based on Simulated Data. 2012 Ninth Web Information Systems and Applications Conference . 2012
  • 10Bertsimas D, Gupta S, Lulli G. Dynamic resource allocation a flexible and tractable modeling framework[J]. European Journal of Operational Research, 2014, 236(1) : 14 - 26.

共引文献99

同被引文献40

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部