摘要
基于状态的因素化表达 ,提出了一个新的 SARSA(λ)激励学习算法 .其基本思想是根据状态的特征得出状态相似性启发式 ,再根据该启发式对状态空间进行聚类 ,大大减少了状态空间搜索与计算的复杂度 ,因此比较适用于求解大状态空间的 MDPs问题 .
Based on the factored representation of a state, a new SARSA( λ ) algorithm is proposed. The main principle of the algorithm is that a heuristics on the state similarities can be gained from the features of the state, and according to the heuristics, the state space is aggregated, significantly reducing the searching and computing complexity for the state space. Therefore the algorithm is a promise for solving large scale MDPs problems which are of a huge state space.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2001年第1期88-92,共5页
Journal of Computer Research and Development
关键词
激励学习
状态聚类
MDPs
SARSA(λ)学习
reinforcement learning
state aggregate
Markov decision processes
SARSA(λ) learning