期刊文献+

基于自组织模糊RBF网络的连续空间Q学习 被引量:3

A Q-learning Method for Continuous Space Based on Self-organizing Fuzzy RBF Network
下载PDF
导出
摘要 针对连续空间下的强化学习控制问题,提出了一种基于自组织模糊RBF网络的Q学习方法.网络的输入为状态,输出为连续动作及其Q值,从而实现了"连续状态—连续动作"的映射关系.首先将连续动作空间离散化为确定数目的离散动作,采用完全贪婪策略选取具有最大Q值的离散动作作为每条模糊规则的局部获胜动作.然后采用命令融合机制对获胜的离散动作按其效用值进行加权,得到实际作用于系统的连续动作.另外,为简化网络结构和提高学习速度,采用改进的RAN算法和梯度下降法分别对网络的结构和参数进行在线自适应调整.倒立摆平衡控制的仿真结果验证了所提Q学习方法的有效性. For reinforcement learning control in continuous spaces, a Q-learning method based on a self-organizing fuzzy RBF (radial basis function) network is proposed. Input of the fuzzy RBF network is state, and the outputs are continuous actions and the corresponding Q-values, which realizes the mapping from a continuous state space to a continuous action space. At first, the continuous action space is discretized into the discrete actions with definite number, and a completely greedy policy is used to select a discrete action with the maximum Q-value as the winning local actions of each fuzzy rule. Then a command fusion mechanism is adopted to weight the winning local actions of each fuzzy rule according to its utility value, and a continuous action is generated for the actual system. Moreover, in order to simplify the network structure and improve the learning speed, an improved resource allocating network ( RAN ) algorithm and a gradient descent algorithm are applied to adjust the structure and parameters of the fuzzy RBF network in an on-line and adaptive manner respectively. The effectiveness of the proposed Q-learning method is shown through simulation on the balancing control of an inverted pendulum system.
出处 《信息与控制》 CSCD 北大核心 2008年第1期1-8,共8页 Information and Control
基金 教育部博士点基金资助项目(20070290537) 国家博士后科学基金资助项目(20070411064) 江苏省博士后科学基金资助项目(0601033B) 江苏省高校青蓝工程资助项目(苏教师[2007]2号) 中国矿业大学青年科研基金资助项目(0C060093)
关键词 自组织 模糊RBF网络 连续空间 Q学习 Q值 self-organizing fuzzy RBF ( radial basis function) network continuous space Q-learning Q-value
  • 相关文献

参考文献12

  • 1Watkins C J C H, Dayan P. Technical note: Q-learning [J]. Machine Learning, 1992, 8(3 -4) : 279-292.
  • 2Ster B. An integrated learning approach to environment modelling in mobile robot navigation [J]. Neurocomputing, 2004, 57( 1 -4) : 215-238.
  • 3Touzet C F. Neural reinforcement learning for behaviour synthesis [J]. Robotics and Autonomous Systems, 1997, 22(3 -4) : 251 -281.
  • 4Santamaria J C, Sutton R S, Ram A. Experiments with reinforcement learning in problems with continuous state and action spaces [J]. Adaptive Behavior, 1997, 6(2) : 163-217.
  • 5Smith A J. Applications of the self-organising map to reinforcement learning [J]. Neural Networks, 2002, 15(8 -9) : 1107-1124.
  • 6Wang L X, Mendel J M. Fuzzy basis functions, universal approximation, and orthogonal least-squares [ J ]. IEEE Transactions on Neural Networks, 1992, 3(5) : 807 -814.
  • 7鲍鸿,黄心汉,李锡雄,毛宗源.用模糊RBF神经网络简化模型设计多变量自适应模糊控制器[J].控制理论与应用,2000,17(2):169-174. 被引量:14
  • 8Kim M S, Hong S G, Lee J J. On-line fuzzy Q-learning with extended rule and interpolation technique [ A ]. Proceedings of the IEEE International Conference on Intelligent Robots and Systems [C]. Piscataway, NJ, USA: IEEE, 1999. 757-762.
  • 9Platt J. A resource-allocating network for function interpolation [J]. Neural Computation, 1991, 3(2) : 213 -225.
  • 10Meesad P, Yen G G. Accuracy, comprehensibility and completeness evaluation of a fuzzy expert system [ J ]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2003, 11(4) : 445 -466.

二级参考文献16

共引文献67

同被引文献36

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部