期刊文献+

结合记忆网络和动态折扣系数的深度Q网络

Deep Q Network Combining Memory Network and Dynamic Discount Coefficient
下载PDF
导出
摘要 深度强化学习立足于解决环境交互问题,实现智能体的连续序列决策。传统强化学习算法基于马尔科夫决策过程,未来的状态仅与当前的状态有关,忽略了序列决策过程中记忆对当前决策的重要影响。此外,奖励的折扣系数为固定值,难以描述不同训练阶段当前奖励值与未来奖励期望对当前决策的动态影响。通过结合深度Q网络和循环记忆网络的神经网络模型,将序列决策的长期记忆加入决策过程,同时设置动态折扣系数,对不同训练阶段的深度Q网络模型赋予适当的折扣系数,从而加速了深度Q网络模型收敛并提高了其性能。 Deep reinforcement learning is based on solving the problem of interaction with the environment to realize the continuous sequence decision-making of agents. The traditional reinforcement learning algorithm is based on Markov decision-making process, and the future state is only related to the current state, while ignoring the important influence of memory on the current decision-making in the sequential decision-making process. In addition, the discount coefficient of reward is fixed, which is difficult to describe the dynamic impact of current reward value and future reward expectation on current decision-making in different training stages. By combining the neural network model of deep Q network and circular memory network, the long-term memory of sequential decision-making is added to the decision-making process, and the dynamic discount coefficient is set to give appropriate discount coefficient to the deep Q network model in different training stages, so as to accelerate the convergence of deep Q network model and improve its performance.
作者 钟榛 闫启帅 ZHONG Zhen;YAN Qishuai(Chengdu SIWI Power Electronic Technology Co.,Ltd.,Chengdu Sichuan 610097)
出处 《河南科技》 2021年第22期34-37,共4页 Henan Science and Technology
关键词 深度强化学习 记忆神经网络 深度Q网络 动态折扣系数 deep reinforcement learning memory neural network deep Q network dynamic discount factor
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部