摘要
针对部分可观测马氏决策过程(POMDP)中,由于感知混淆现象的存在,利用Sarsa等算法得到的无记忆策略可能发生振荡的现象,研究了一种基于记忆的强化学习算法——CPnSarsa(λ)学习算法来解决该问题.它通过重新定义状态,Agent结合观测历史来识别混淆状态.将CPnSarsa(λ)算法应用到一些典型的POMDP,最后得到的是最优或近似最优策略.与以往算法相比,该算法的收敛速度有了很大提高.
In partially observable markov decision processes (POMDP), due to perceptual aliasing, the memoryless policies obtained by Sarsa-learning may oscillate. A memory-based new reinforcement learning algorithm-CpnSarsa (A) is studied to solve this problem. With new definitions of states, the agent combines current observation with preobservations to distinguish aliasing states. With application of the algorithm to some typical POMDP, the optimal or almost-optimal policies are obtained. Comparing with previous algorithms, this algorithm greatly improves the convergence rate.
出处
《控制与决策》
EI
CSCD
北大核心
2004年第11期1263-1266,共4页
Control and Decision
基金
国家自然科学基金重点项目(60234030)
青年科学基金资助项目(60303012).