期刊文献+

基于递推最小二乘法的多步时序差分学习算法 被引量:5

Multi-step temporal difference learning algorithm based on recursive least-squares method
下载PDF
导出
摘要 强化学习是一种重要的机器学习方法。为了提高强化学习过程的收敛速度和减少学习过程值函数估计的误差,提出了基于递推最小二乘法的多步时序差分学习算法(RLS-TD(λ))。证明了在满足一定条件下,该算法的权值将以概率1收敛到唯一解,并且得出和证明了值函数估计值的误差应满足的关系式。迷宫实验表明,与RLS-TD(0)算法相比,该算法能加快学习过程的收敛,与传统的TD(λ)算法相比,该算法减少了值函数估计误差,从而提高了精度。 Reinforcement learning is one of most important machine learning methods.In order to solve the problem of slow convergence speed and the error of value function in reinforcement learning systems,a multi-step Temporal Difference(TD(λ)) learning algorithm using Recursive Least-Squares (RSL) methods (RLS-TD (λ)) is proposed.The proposed algorithm is based on RLS-TD(0) ,its convergence is proved,and its formula of error estimation is obtained.The experiment on maze problem demonstrates that the algorithm can speed up the convergence of the learning process compared with RLS-TD(0),and improve the learning precision compared with TD(λ).
出处 《计算机工程与应用》 CSCD 北大核心 2010年第8期52-55,共4页 Computer Engineering and Applications
关键词 强化学习 时序差分 最小二乘 收敛 RLS—TD(λ)算法 reinforcement learning temporal difference Recursive Least-Squares( RLS ) convergence RIS-TD(λ ) algorithm
  • 相关文献

参考文献13

  • 1Sutton R S,Barto A G.Reinforeement learning:An introduction[M]. Cambridge, MA: MIT Press, 1998.
  • 2李珺,潘启树,洪炳镕.一种基于案例推理的多agent强化学习方法研究[J].机器人,2009,31(4):320-326. 被引量:4
  • 3Syafiie S,Tadeo F,Martinez E.Model-free learning control of neutralization processes using reinforcement learning [J].Engineering Applications of Artificial Intelligence, 2007,20(6) : 762-782.
  • 4王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219. 被引量:20
  • 5Wang Xue-song,Cheng Yu-hu,Yi Jian-qiang.A fuzzy actor-critic reinforcement learning network [J].Information Sciences, 2007,177 (18) :3764-3781.
  • 6Samuel A LSome studies in machine learning using game of checkers[J].IBM Journal of Research and Development, 1959,3: 211-229.
  • 7Sutton R.Larning to predict by the method of temporal differences[J]. Machine Learning, 1988,3( 1 ) : 9-44.
  • 8Tsitsiklis J N,Van Roy B.An analysis of temporal difference learning with function approximation[J].IEEE Transaction on Automatic Control, 1997,42(5 ) : 674-690.
  • 9Yu H,Bertsekas D.Convergence results for some temporal difference methods based on least squares [J].IEEE Transaction on Automatic Control,2009,54(7): 1515-1531.
  • 10高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:268

二级参考文献32

  • 1高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量:13
  • 2郭锐,吴敏,彭军,彭姣,曹卫华.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4):367-372. 被引量:13
  • 3Suykens J A K, Vandewale J. Least squares support vector machine classifiers. Neural Processing Letters, 1999, 9(3): 293-300.
  • 4Watkins C J C H, Dayan P. Q-learning. Machine Learning, 1992, 8(3-4): 279-292.
  • 5Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 1996, 4(2): 237-285.
  • 6Kyriakos M, Dimitris P. Continuous nearest neighbor queries over sliding windows. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(6): 789-803.
  • 7Wang X S, Tian X L, Cheng Y H. Value approximation with least squares support vector machine in reinforcement learning system. Journal of Computational and Theoretical Nanoscience, 2007, 4(7-8): 1290-1294.
  • 8Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
  • 9Conn K, Peters R A. Reinforcement learning with a supervisor for a mobile robot in a real-world environment. In: Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation. Piscataway, USA: IEEE, 2007. 73-78.
  • 10Syafiie S, Tadeo F, Martinez E. Model-free learning control of neutralization processes using reinforcement learning. Engineering Applications of Artificial Intelligence, 2007, 20 (6): 767- 782.

共引文献290

同被引文献49

引证文献5

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部