摘要
针对强化学习算法训练网络规模较大、运行时间较长、过度拟合等问题,提出一种记忆可修剪型强化学习仿生模型(H-RLM)作为两轮机器人的学习机制。该算法将神经网络输出与期望输出的最小均方差作为代价函数,采用Hessian矩阵和Markov相结合的决策进行寻优,选择最大评价值对应的最优行为。这样既可以保证初期网络学习训练内容的完整性,又降低了系统对初始条件的约束性,提高了控制算法的泛化能力。利用H-RLM和强化算法对两轮机器人进行速度跟踪实验,结果表明,H-RLM算法能够提高网络学习效率、消除延迟影响、减小输出误差,获得了良好的动态性能。
Since the reinforcement learning algorithm has the problems of large scale, long running time and over fitting for network training, a pruning reinforcement learning model (H-RLM) taken as the learning mechanism of the two-wheeled robot is proposed. The output of neural network and least mean square error of expected output are deem as the cost function of the algorithm. The Hessian matrix and Markov decision model are combined to select the optimal behavior corresponding to the maxi- mum evaluation value, which can ensure the integrity of the training content of the network learning in initial period, and reduce the system contraints for initial conditions, and improve the generalization ability of the control algorithm. The speed tracking experiments were carried on by means of H-RLM algorithm and reinforcement learning algorithm. The experimental resuits show that the H-RLM algorithm can improve the network learning efficiency, eliminate the delay effect, reduce the output error, and obtain better dynamic performance.
出处
《现代电子技术》
北大核心
2017年第15期141-145,共5页
Modern Electronics Technique
基金
国家自然科学基金项目(61203343)
河北省自然科学基金项目(E2014209106)
河北省教育厅高等学校科学技术研究项目(QN2016102
QN2016105)
华北理工大学研究生创新项目(2016S10)