期刊文献+

基于性能势的A*平均奖赏强化学习算法研究 被引量:2

Study on the A* Average Reward Reinforcement Learning Algorithm Based on Performance Potentials
下载PDF
导出
摘要 强化学习和性能势理论是当前人工智能领域的研究热点,RoboCup足球机器人仿真为人工智能和机器人学研究提供了一个良好的实验平台,针对强化学习和性能势理论在足球机器人仿真应用中求解过程不稳定和收敛速度过慢问题,提出了一个新的强化学习算法——基于性能势的A*平均奖赏强化学习算法(GA*-learning)。GA*-learning在基于性能势的平均奖赏强化学习算法(G-learning)中加入启发式函数,根据启发式策略确定动作的选择,从而加快学习收敛速度。把GA*-learning运用到通过简化的机器人足球领域——keepaway,仿真结果验证了算法能有效提高系统的性能和收敛速度。 Reinforcement learning (RL) and performance potentials theory are research hotspots of Artificial Intelligence (AI). RoboCup Soccer Simulation is a good test platform in which the AI and Robotics are studied. Considering the disadvantages of RL and performance potentials theory used in soccer simulation, such as unstable during the solving process and the long learning time, this work presents a new RL algorithm, called GA * -learning, that is based on the performance potential theory and heuristic search. A heuristic function that influences the choice of the actions according to some heuristic policies is used in G-learning to accelerate the rate of convergence. With the in- troduction of a simplified simulator for the robot soccer domain-keepaway, a set of empirical evaluations are conducted for the proposed algorithm. Simultaneously, the simulation results show the improvement in the system performance and learning time of the algorithm.
出处 《计算机仿真》 CSCD 北大核心 2014年第7期338-341,共4页 Computer Simulation
关键词 强化学习 性能势 启发式搜索 半马尔科夫决策过程 Reinforcement learning Performance potentials Heuristic search Semi-Markov decision process
  • 相关文献

参考文献1

二级参考文献14

  • 1Puterman M L.Markov Decision Process:Discrete Dynamic Dtochastic Programming.New-York:Wiley,1994
  • 2Kaya M,Alhajj R.Fuzzy olap association rules mining based modular reinforcement learning approach for multiagent systems.IEEE Transactions on Systems,Man and Cybernetics part B:Cybernetics,2005,35(2):326-338
  • 3Singh S,Bertsekas D.Reinforcement learning for dynamic channel allocation in cellular telephone systems//Mozer M C,Jordan M L,Petsche T.Proceedings of the NIPS-9.Cambridge MA:MIT Press,1997:974
  • 4Vengerov D N,Berenji H R.A fuzzy reinforcement learning approach to power control in wireless transmitters.IEEE Transactions on Systems,Man,and Cybernetics part B:Cybernetics,2005,35(4):768-778
  • 5Critesl R H,Barto A G.Elevator group control using multiple reinforcement learning Agents.Machine Learning,1998,33(2/3):235-262
  • 6Kaelbling L P,Littman M L,Moore A P.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4:237-285
  • 7Sutton R S,Barto A G.Reinforcement Learning:An Introduction.Cambridge MA:MIT Press,1998
  • 8Schwartz A.A reinforcement learning method for maximizing undiscounted rewards//Huns M N,Singh M P eds.Proceedings of the 10th Annual Conference on Machine Learning.San Francisco:Morgan Kaufmann,1993:298-305
  • 9Tadepalli P,Ok D.Model-based average reward reinforcement learning.Artificial Intelligence,1998,100(1/2):177-224
  • 10Gosavi A.Reinforcement learning for long run average cost.European Journal of Operational Research,2004,155 (3):654-674

共引文献37

同被引文献5

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部