摘要
近年来强化学习愈发体现其强大的学习能力,2017年AlphaGo在围棋上击败世界冠军,同时在复杂竞技游戏星际争霸2和DOTA2中人类的顶尖战队也败于AI之手,但其自身又存在着自身的弱点,在不断的发展中瓶颈逐渐出现。分层强化学习因为能够解决其维数灾难问题,使得其在环境更为复杂,动作空间更大的环境中表现出更加优异的处理能力,对其的研究在近几年不断升温。对强化学习的基本理论进行简要介绍,对Option、HAMs、MAXQ这3种经典分层强化学习算法进行介绍,之后对近几年在分层的思想下提出的分层强化学习算法从3个方面进行综述,并对其进行分析,讨论了分层强化学习的发展前景和挑战。
In recent years,reinforcement learning has increasingly reflected its strong learning ability.In 2017,AlphaGo beat the world champion in go.Meanwhile,in the complex competitive games StarCraft 2 and dota2,the top human teams are also defeated by AI.However,it has its own weaknesses,and the bottleneck gradually appears in the continuous development.Hierarchical reinforcement learning can solve the problem of dimension disaster,which makes it show more excellent processing ability in the environment with more complex environment and larger action space.This paper briefly introduces the basic theory of reinforcement learning.It introduces three classical hierarchical reinforcement learning algorithms,option,hams and MAXQ.It summarizes and analyzes the hierarchical reinforcement learning algorithm proposed in recent years under the idea of stratification from three aspects.It discusses the development prospects and challenges of hierarchical reinforcement learning.
作者
赖俊
魏竞毅
陈希亮
LAI Jun;WEI Jingyi;CHEN Xiliang(College of Command Information System,Army Engineering University,Nanjing 210007,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第3期72-79,共8页
Computer Engineering and Applications
基金
国家自然科学基金(61806221)。
关键词
分层强化学习
子策略共享
多层分层结构
自动分层
hierarchical reinforcement learning
subpolicy sharing
multi-layer hierarchical structure
automatic stratification