摘要
针对连续空间下的强化学习控制问题,提出了一种基于自组织模糊RBF网络的Q学习方法.网络的输入为状态,输出为连续动作及其Q值,从而实现了"连续状态—连续动作"的映射关系.首先将连续动作空间离散化为确定数目的离散动作,采用完全贪婪策略选取具有最大Q值的离散动作作为每条模糊规则的局部获胜动作.然后采用命令融合机制对获胜的离散动作按其效用值进行加权,得到实际作用于系统的连续动作.另外,为简化网络结构和提高学习速度,采用改进的RAN算法和梯度下降法分别对网络的结构和参数进行在线自适应调整.倒立摆平衡控制的仿真结果验证了所提Q学习方法的有效性.
For reinforcement learning control in continuous spaces, a Q-learning method based on a self-organizing fuzzy RBF (radial basis function) network is proposed. Input of the fuzzy RBF network is state, and the outputs are continuous actions and the corresponding Q-values, which realizes the mapping from a continuous state space to a continuous action space. At first, the continuous action space is discretized into the discrete actions with definite number, and a completely greedy policy is used to select a discrete action with the maximum Q-value as the winning local actions of each fuzzy rule. Then a command fusion mechanism is adopted to weight the winning local actions of each fuzzy rule according to its utility value, and a continuous action is generated for the actual system. Moreover, in order to simplify the network structure and improve the learning speed, an improved resource allocating network ( RAN ) algorithm and a gradient descent algorithm are applied to adjust the structure and parameters of the fuzzy RBF network in an on-line and adaptive manner respectively. The effectiveness of the proposed Q-learning method is shown through simulation on the balancing control of an inverted pendulum system.
出处
《信息与控制》
CSCD
北大核心
2008年第1期1-8,共8页
Information and Control
基金
教育部博士点基金资助项目(20070290537)
国家博士后科学基金资助项目(20070411064)
江苏省博士后科学基金资助项目(0601033B)
江苏省高校青蓝工程资助项目(苏教师[2007]2号)
中国矿业大学青年科研基金资助项目(0C060093)
关键词
自组织
模糊RBF网络
连续空间
Q学习
Q值
self-organizing
fuzzy RBF ( radial basis function) network
continuous space
Q-learning
Q-value