Aimed at the lack of self-tuning PID parameters in conventional PID controllers, the structure and learning algorithm of an adaptive PID controller based on reinforcement learning were proposed. Actor-Critic learning ...Aimed at the lack of self-tuning PID parameters in conventional PID controllers, the structure and learning algorithm of an adaptive PID controller based on reinforcement learning were proposed. Actor-Critic learning was used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network was used to approximate the policy function of Actor and the value function of Critic simultaneously. The inputs of RBF network are the system error, as well as the first and the second-order differences of error. The Actor can realize the mapping from the system state to PID parameters, while the Critic evaluates the outputs of the Actor and produces TD error. Based on TD error performance index and gradient descent method, the updating rules of RBF kernel function and network weights were given. Simulation results show that the proposed controller is efficient for complex nonlinear systems and it is perfectly adaptable and strongly robust, which is better than that of a conventional PID controller.展开更多
基金Projects 0601033B supported by the Science Foundation for Post-doctoral Scientists of Jiangsu Province, 0C4466 and 0C060093the Scientific and Technological Foundation for Youth of China University of Mining & Technology
文摘Aimed at the lack of self-tuning PID parameters in conventional PID controllers, the structure and learning algorithm of an adaptive PID controller based on reinforcement learning were proposed. Actor-Critic learning was used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network was used to approximate the policy function of Actor and the value function of Critic simultaneously. The inputs of RBF network are the system error, as well as the first and the second-order differences of error. The Actor can realize the mapping from the system state to PID parameters, while the Critic evaluates the outputs of the Actor and produces TD error. Based on TD error performance index and gradient descent method, the updating rules of RBF kernel function and network weights were given. Simulation results show that the proposed controller is efficient for complex nonlinear systems and it is perfectly adaptable and strongly robust, which is better than that of a conventional PID controller.