摘要
强化学习和性能势理论是当前人工智能领域的研究热点,RoboCup足球机器人仿真为人工智能和机器人学研究提供了一个良好的实验平台,针对强化学习和性能势理论在足球机器人仿真应用中求解过程不稳定和收敛速度过慢问题,提出了一个新的强化学习算法——基于性能势的A*平均奖赏强化学习算法(GA*-learning)。GA*-learning在基于性能势的平均奖赏强化学习算法(G-learning)中加入启发式函数,根据启发式策略确定动作的选择,从而加快学习收敛速度。把GA*-learning运用到通过简化的机器人足球领域——keepaway,仿真结果验证了算法能有效提高系统的性能和收敛速度。
Reinforcement learning (RL) and performance potentials theory are research hotspots of Artificial Intelligence (AI). RoboCup Soccer Simulation is a good test platform in which the AI and Robotics are studied. Considering the disadvantages of RL and performance potentials theory used in soccer simulation, such as unstable during the solving process and the long learning time, this work presents a new RL algorithm, called GA * -learning, that is based on the performance potential theory and heuristic search. A heuristic function that influences the choice of the actions according to some heuristic policies is used in G-learning to accelerate the rate of convergence. With the in- troduction of a simplified simulator for the robot soccer domain-keepaway, a set of empirical evaluations are conducted for the proposed algorithm. Simultaneously, the simulation results show the improvement in the system performance and learning time of the algorithm.
出处
《计算机仿真》
CSCD
北大核心
2014年第7期338-341,共4页
Computer Simulation
关键词
强化学习
性能势
启发式搜索
半马尔科夫决策过程
Reinforcement learning
Performance potentials
Heuristic search
Semi-Markov decision process