期刊文献+

HAMs体系中的同态变换方法研究 被引量:1

Research on HAMs-family Based Homomorphisms
下载PDF
导出
摘要 HAMs体系的一个主要问题是:它的状态空间是由机器状态与环境状态共同生成的联合状态空间,而基于子过程的状态抽象方法也不能完全解决这个问题.本文对此进行了详细的分析,并从策略耦合SMDPs的观点分析与描述了HAMs模型,提出一系列基于HAMs的同态变换的形式化定义及证明了几个较为实用的定理,表明同态变换方法可以有效地解决这一问题.在此基础上,总结了应用同态变换进行状态抽象的几个重要的观点.并使用本文提出的方法对一个典型的实例进行了分析与验证. A main problem that exists in HAMs-family HRL is its joint state space consisting of the cross-product of the machine states in the HAM and the states in the original MDP, which is not completely solved by a subroutine-based state abstraction method. This paper analyzes this problem in detail and provides formal descriptions on HAMs model by using "policy- coupled" semi-Markov decision processes. It also provides formal definitions on HAMs-based homomorphisms, proves some useful theorems, and shows that the HAMs-based homomorphisms can conquer this problem. This paper concludes some important opinions on applying homomorphisms to state abstractions. Lastly, a typical example is analyzed and evaluated.
出处 《小型微型计算机系统》 CSCD 北大核心 2008年第11期2074-2082,共9页 Journal of Chinese Computer Systems
基金 国家自然科学基金面上项目(60503048)资助
关键词 层次强化学习 层次抽象机 同态变换 hierarchical reinforcement learning hierarchies of abstract machines homomorphism
  • 相关文献

参考文献20

  • 1Sutton R S, Precup D, Singh S. Between MDPs and semi- MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999, 112:181-211.
  • 2Parr R. Hierarchical control and learning for markov decision processes [D]. University of California, Berkeley, California, 1998.
  • 3Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelligence Research, 2000, 13: 227-303.
  • 4Thomas Dean, Robert Givan. Model minimization in markov decision proeesses[C]. In AAAI/IAAI, 1997, 106-111.
  • 5Ravindran B, Barto A G. SMDP homomorphisms: an algebraic approach to abstraction in semi markov decision processes [C]. In : Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 03), AAAI Press, 2003, 1011- 1016.
  • 6Ravindran B, Barto A G. Symmetries and model minimization of markov decision processes[R]. Computer Science Technical Report 01-43, University of Massachusetts, Amherst, MA, 2001.
  • 7Craig Boutilier, Richard Dearden. Using abstractions for decision-theoretic planning with time constraints[C]. In:Proceedings of the Twelfth National Conference on Artificial Intelligence ({AAAI}-94), AAAI, 1016-1022.
  • 8Bernhard Hengst. Variable resolution in hierarchical RL[R]. Technical Report UNSW CSE TR 0309, National ICT Australia, School of Computer Science and Engineering, University of New South Wales, Sydney NSW Australia, May 2003.
  • 9Hartmanis J, Stearns R E. Algebraic structure theory of sequential machines [M]. Prentice-Hall, Englewood Cliffs, N J, 1966.
  • 10Ravindran B. An algebraic approach to abstraction in reinforcement learning[D]. Doctoral Dissertation, Department of Computer Science, University of Massachusetts, Amherst MA, 2004.

同被引文献26

  • 1Silver D, Sutton R, Mtller M. Temporal-difference search in computer Go [J]. Machine Learning, 2012, 87.. 183 219.
  • 2Wang F Y, Jin N, Liu D R, et al. Adaptive dynamic programming for finite horizon optimal control of discrete time nonlinear systems with e-error bound [J]. IEEE Transactions on Neural Networks, 2011, 22 (1): 24-36.
  • 3Hafner R, Riedmiller M. Reinforcement learning in feedback control: challenges and benchmarks from technical process control[J]. Machine Learning, 2011, 84: 137-169.
  • 4Choi J, Klm K E. Inverse reinforcement learning in partially observable environments[J]. Journal of Machine Learning Research, 2011, 12: 691-730.
  • 5Meltzoff, A N, Kuhl, P K, Movellan J, et al. Founda- tions for a new science of learning[J]. Science, 2009, 325: 284-288.
  • 6Kovacs T, Egginton R. On the analysis and design of software for reinforcement learning with a survey of existing systems [J]. Machine Learning, 2011, 84: 7 -49.
  • 7Doshi-Velez F, Pineau J, Roy N. Reinforcementlearning with limited reinforcement: Using Bayes risk for active learning in POMDPs [J]. Artificial Intelligence, 2012, 1870 188: 115-132.
  • 8Frommberger L, Wolter D. Structural knowledge transfer by spatial abstraction for reinforcement learning agents[J]. Adaptive Behavior, 2010, 18 (6): 531-539.
  • 9Kozlova O. Hierarchical & Factored reinforcement lea- rning[D]. Paris: Universit6 Pierre et Marie Curie, 2010.
  • 10Guestrin C, Koller D, Parr R, et al. Efficient solution algorithms for factored MDPs[J]. Journal of Artificial Intelligence Research, 2003, 19: 399-468.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部