期刊文献+

分层强化学习中的Option自动生成算法 被引量:5

Option Automatic Generation in Hierarchical Reinforcement Learning
下载PDF
导出
摘要 分层强化学习中目前有Option、HAM和MAXQ三种主要方法,其自动分层问题均未得到有效解决,该文针对第一种方法,提出了Option自动生成算法,该算法以Agent在学习初始阶段探测到的状态空间为输入,采用人工免疫网络技术对其进行聚类,在聚类后的各状态子集上通过经验回放学习产生内部策略集,从而生成Option,仿真实验验证了该算法的有效性。 There are currently three typical approaches,namely,0ption,HAM,and MAXQ,for hierarchical reinforcement learning,whereas the open problem that generates hierarchies automatically is not solved well,Aiming at the first approach,this paper presents an algorithm for Option automatic generation.The algorithm takes the state space explored by Agent in the initial learning phase and clusters the states employing artificial immune net,Based on the clustered state sets,the intra-strategies are learned by an experience replay procedure.As a result,the Options are generated.The validity of the algorithm is demonstrated by simulation experiments.
出处 《计算机工程与应用》 CSCD 北大核心 2005年第34期4-6,15,共4页 Computer Engineering and Applications
基金 部委基础研究计划项目
关键词 分层强化学习 OPTION 人工免疫网络 经验回放 hierarchical reinforcement learning, Option, artificial immune net, experience replay
  • 相关文献

参考文献13

  • 1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:268
  • 2[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41~77
  • 3[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181~211
  • 4[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
  • 5[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227~303
  • 6[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321~330
  • 7[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361~368
  • 8[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295~306
  • 9[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560~567
  • 10[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000

二级参考文献4

共引文献267

同被引文献25

  • 1沈晶,顾国昌,刘海波.分层强化学习研究综述[J].模式识别与人工智能,2005,18(5):574-581. 被引量:7
  • 2苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量:9
  • 3王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量:8
  • 4沈晶,顾国昌,刘海波.基于多智能体的Option自动生成算法[J].智能系统学报,2006,1(1):84-87. 被引量:2
  • 5沈晶,顾国昌,刘海波.一种新的分层强化学习方法[J].计算机应用,2006,26(8):1938-1939. 被引量:1
  • 6Sutton R S,Precup D,Singh S P.Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1/2) : 181-211.
  • 7Parr R.Hierarchical control and learning for Markov decision processes[D].Berkeley:University of California, 1998.
  • 8Dietterich T G.Hierarchical reinforcement Learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research, 2000,13 : 227-303.
  • 9McGovern A,Barto A.Autonomous discovery of subgoals in reinforcement learning using deverse density[C]//Proceedings of the 8th International Conference on Machine Learning.San Fransisco: Morgan Kaufmann, 2001 : 361-368.
  • 10Menache I,Mannor S,Shimkin N.Q-cut:dynamic discovery of subgoals in reinforcement learning[C]//LNCS 2430:Proc of the 13th ECML, 2002: 295-306.

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部