Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed metho...In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.展开更多
An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic progra...An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic programming(ADP)algorithm under two event-based triggering mechanisms.It is often challenging to design an optimal control law due to the system deviation caused by asymmetric input constraints.First,a prescribed performance control technique is employed to guarantee the tracking errors within predetermined boundaries.Subsequently,considering the asymmetric input constraints,a discounted non-quadratic cost function is introduced.Moreover,in order to reduce controller updates,an event-triggered control law is developed for ADP algorithm.After that,to further simplify the complexity of controller design,this work is extended to a self-triggered case for relaxing the need for continuous signal monitoring by hardware devices.By employing the Lyapunov method,the uniform ultimate boundedness of all signals is proved to be guaranteed.Finally,a simulation example on a mass–spring–damper system subject to asymmetric input constraints is provided to validate the effectiveness of the proposed control scheme.展开更多
This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is int...This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.展开更多
This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems.The main idea is to use the adaptive dynamic programming(ADP) technique to obtain the optimal battery s...This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems.The main idea is to use the adaptive dynamic programming(ADP) technique to obtain the optimal battery sequential control iteratively. First, the battery energy management system model is established, where the power efficiency of the battery is considered. Next, considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees that the value of the iterative control law cannot exceed the maximum charging/discharging power of the battery to extend the service life of the battery.Then, the convergence properties of the iterative ADP algorithm are analyzed, which guarantees that the iterative value function and the iterative control law both reach the optimums. Finally,simulation and comparison results are given to illustrate the performance of the presented method.展开更多
A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource...A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.展开更多
The convergence and stability of a value-iteration-based adaptive dynamic programming(ADP) algorithm are considered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More impor...The convergence and stability of a value-iteration-based adaptive dynamic programming(ADP) algorithm are considered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More importantly than sufficing to achieve a good approximate structure, the iterative feedback control law must guarantee the closed-loop stability. Specifically, it is firstly proved that the iterative value function sequence will precisely converge to the optimum.Secondly, the necessary and sufficient condition of the optimal value function serving as a Lyapunov function is investigated. We prove that for the case of infinite horizon, there exists a finite horizon length of which the iterative feedback control law will provide stability, and this increases the practicability of the proposed value iteration algorithm. Neural networks(NNs) are employed to approximate the value functions and the optimal feedback control laws, and the approach allows the implementation of the algorithm without knowing the internal dynamics of the system. Finally, a simulation example is employed to demonstrate the effectiveness of the developed optimal control method.展开更多
In this paper, a data-based fault tolerant control(FTC) scheme is investigated for unknown continuous-time(CT)affine nonlinear systems with actuator faults. First, a neural network(NN) identifier based on particle swa...In this paper, a data-based fault tolerant control(FTC) scheme is investigated for unknown continuous-time(CT)affine nonlinear systems with actuator faults. First, a neural network(NN) identifier based on particle swarm optimization(PSO) is constructed to model the unknown system dynamics. By utilizing the estimated system states, the particle swarm optimized critic neural network(PSOCNN) is employed to solve the Hamilton-Jacobi-Bellman equation(HJBE) more efficiently.Then, a data-based FTC scheme, which consists of the NN identifier and the fault compensator, is proposed to achieve actuator fault tolerance. The stability of the closed-loop system under actuator faults is guaranteed by the Lyapunov stability theorem. Finally, simulations are provided to demonstrate the effectiveness of the developed method.展开更多
In this paper,an adaptive dynamic programming(ADP)strategy is investigated for discrete-time nonlinear systems with unknown nonlinear dynamics subject to input saturation.To save the communication resources between th...In this paper,an adaptive dynamic programming(ADP)strategy is investigated for discrete-time nonlinear systems with unknown nonlinear dynamics subject to input saturation.To save the communication resources between the controller and the actuators,stochastic communication protocols(SCPs)are adopted to schedule the control signal,and therefore the closed-loop system is essentially a protocol-induced switching system.A neural network(NN)-based identifier with a robust term is exploited for approximating the unknown nonlinear system,and a set of switch-based updating rules with an additional tunable parameter of NN weights are developed with the help of the gradient descent.By virtue of a novel Lyapunov function,a sufficient condition is proposed to achieve the stability of both system identification errors and the update dynamics of NN weights.Then,a value iterative ADP algorithm in an offline way is proposed to solve the optimal control of protocol-induced switching systems with saturation constraints,and the convergence is profoundly discussed in light of mathematical induction.Furthermore,an actor-critic NN scheme is developed to approximate the control law and the proposed performance index function in the framework of ADP,and the stability of the closed-loop system is analyzed in view of the Lyapunov theory.Finally,the numerical simulation results are presented to demonstrate the effectiveness of the proposed control scheme.展开更多
In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control p...In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.展开更多
The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of t...The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.展开更多
Nonlinear loads in the power distribution system cause non-sinusoidal currents and voltages with harmonic components.Shunt active filters(SAF) with current controlled voltage source inverters(CCVSI) are usually used t...Nonlinear loads in the power distribution system cause non-sinusoidal currents and voltages with harmonic components.Shunt active filters(SAF) with current controlled voltage source inverters(CCVSI) are usually used to obtain balanced and sinusoidal source currents by injecting compensation currents.However,CCVSI with traditional controllers have a limited transient and steady state performance.In this paper,we propose an adaptive dynamic programming(ADP) controller with online learning capability to improve transient response and harmonics.The proposed controller works alongside existing proportional integral(PI) controllers to efficiently track the reference currents in the d-q domain.It can generate adaptive control actions to compensate the PI controller.The proposed system was simulated under different nonlinear(three-phase full wave rectifier) load conditions.The performance of the proposed approach was compared with the traditional approach.We have also included the simulation results without connecting the traditional PI control based power inverter for reference comparison.The online learning based ADP controller not only reduced average total harmonic distortion by 18.41%,but also outperformed traditional PI controllers during transients.展开更多
This study deals with reliable control problems in data-driven cyber-physical systems(CPSs) with intermittent communication faults, where the faults may be caused by bad or broken communication devices and/or cyber at...This study deals with reliable control problems in data-driven cyber-physical systems(CPSs) with intermittent communication faults, where the faults may be caused by bad or broken communication devices and/or cyber attackers. To solve them, a watermark-based anomaly detector is proposed, where the faults are divided to be either detectable or undetectable.Secondly, the fault's intermittent characteristic is described by the average dwell-time(ADT)-like concept, and then the reliable control issues, under the undetectable faults to the detector, are converted into stabilization issues of switched systems. Furthermore,based on the identifier-critic-structure learning algorithm, a datadriven switched controller with a prescribed-performance-based switching law is proposed, and by the ADT approach, a tolerated fault set is given. Additionally, it is shown that the presented switching laws can improve the system performance degradation in asynchronous intervals, where the degradation is caused by the fault-maker-triggered switching rule, which is unknown for CPS operators. Finally, an illustrative example validates the proposed method.展开更多
This paper proposes a novel virtual inertia controller for converters in power systems,whichcan solve the system’s nonlinearity for frequency support.First,the system dynamics are formulatedas a nonlinear state-space...This paper proposes a novel virtual inertia controller for converters in power systems,whichcan solve the system’s nonlinearity for frequency support.First,the system dynamics are formulatedas a nonlinear state-space,in which the reciprocal of inertia is modeled as controlinput.Correspondingly,a cost function is defined by considering frequency deviation andrate of change of the frequency,which can preserve a tradeoff between critical frequencylimits and respective control energy.Following,the optimal frequency regulation problemis solved by using an online adaptive dynamic programming method,where the actor andcritic neural networks are constructed to approximate the optimal control input and optimalcost function,respectively.After that,the small-signal analysis is provided to identify the stabilityof the converter under the proposed controller.Finally,simulation results verify thatthe frequency response of the system is significantly improved,while retaining more DC sideenergy.展开更多
We investigate the optimization of linear impulse systems with the reinforcement learning based adaptive dynamic programming(ADP)method.For linear impulse systems,the optimal objective function is shown to be a quadri...We investigate the optimization of linear impulse systems with the reinforcement learning based adaptive dynamic programming(ADP)method.For linear impulse systems,the optimal objective function is shown to be a quadric form of the pre-impulse states.The ADP method provides solutions that iteratively converge to the optimal objective function.If an initial guess of the pre-impulse objective function is selected as a quadratic form of the pre-impulse states,the objective function iteratively converges to the optimal one through ADP.Though direct use of the quadratic objective function of the states within the ADP method is theoretically possible,the numerical singularity problem may occur due to the matrix inversion therein when the system dimensionality increases.A neural network based ADP method can circumvent this problem.A neural network with polynomial activation functions is selected to approximate the pre-impulse objective function and trained iteratively using the ADP method to achieve optimal control.After a successful training,optimal impulse control can be derived.Simulations are presented for illustrative purposes.展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported by the National Science Fund for Distinguished Young Scholars (62225303)the Fundamental Research Funds for the Central Universities (buctrc202201)+1 种基金China Scholarship Council,and High Performance Computing PlatformCollege of Information Science and Technology,Beijing University of Chemical Technology。
文摘In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.
基金supported in part by the National Natural Science Foundation of China(62033003,62003093,62373113,U23A20341,U21A20522)the Natural Science Foundation of Guangdong Province,China(2023A1515011527,2022A1515011506).
文摘An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic programming(ADP)algorithm under two event-based triggering mechanisms.It is often challenging to design an optimal control law due to the system deviation caused by asymmetric input constraints.First,a prescribed performance control technique is employed to guarantee the tracking errors within predetermined boundaries.Subsequently,considering the asymmetric input constraints,a discounted non-quadratic cost function is introduced.Moreover,in order to reduce controller updates,an event-triggered control law is developed for ADP algorithm.After that,to further simplify the complexity of controller design,this work is extended to a self-triggered case for relaxing the need for continuous signal monitoring by hardware devices.By employing the Lyapunov method,the uniform ultimate boundedness of all signals is proved to be guaranteed.Finally,a simulation example on a mass–spring–damper system subject to asymmetric input constraints is provided to validate the effectiveness of the proposed control scheme.
基金supported in part by the National Key Reseanch and Development Program of China(2018AAA0101502,2018YFB1702300)in part by the National Natural Science Foundation of China(61722312,61533019,U1811463,61533017)in part by the Intel Collaborative Research Institute for Intelligent and Automated Connected Vehicles。
文摘This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.
基金supported in part by National Natural Science Foundation of China(61533017,61273140,61304079,61374105,61379099,61233001)Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3)the Open Research Project from SKLMCCS(20150104)
文摘This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems.The main idea is to use the adaptive dynamic programming(ADP) technique to obtain the optimal battery sequential control iteratively. First, the battery energy management system model is established, where the power efficiency of the battery is considered. Next, considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees that the value of the iterative control law cannot exceed the maximum charging/discharging power of the battery to extend the service life of the battery.Then, the convergence properties of the iterative ADP algorithm are analyzed, which guarantees that the iterative value function and the iterative control law both reach the optimums. Finally,simulation and comparison results are given to illustrate the performance of the presented method.
文摘A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.
文摘The convergence and stability of a value-iteration-based adaptive dynamic programming(ADP) algorithm are considered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More importantly than sufficing to achieve a good approximate structure, the iterative feedback control law must guarantee the closed-loop stability. Specifically, it is firstly proved that the iterative value function sequence will precisely converge to the optimum.Secondly, the necessary and sufficient condition of the optimal value function serving as a Lyapunov function is investigated. We prove that for the case of infinite horizon, there exists a finite horizon length of which the iterative feedback control law will provide stability, and this increases the practicability of the proposed value iteration algorithm. Neural networks(NNs) are employed to approximate the value functions and the optimal feedback control laws, and the approach allows the implementation of the algorithm without knowing the internal dynamics of the system. Finally, a simulation example is employed to demonstrate the effectiveness of the developed optimal control method.
基金supported in part by the National Natural ScienceFoundation of China(61533017,61973330,61773075,61603387)the Early Career Development Award of SKLMCCS(20180201)the State Key Laboratory of Synthetical Automation for Process Industries(2019-KF-23-03)。
文摘In this paper, a data-based fault tolerant control(FTC) scheme is investigated for unknown continuous-time(CT)affine nonlinear systems with actuator faults. First, a neural network(NN) identifier based on particle swarm optimization(PSO) is constructed to model the unknown system dynamics. By utilizing the estimated system states, the particle swarm optimized critic neural network(PSOCNN) is employed to solve the Hamilton-Jacobi-Bellman equation(HJBE) more efficiently.Then, a data-based FTC scheme, which consists of the NN identifier and the fault compensator, is proposed to achieve actuator fault tolerance. The stability of the closed-loop system under actuator faults is guaranteed by the Lyapunov stability theorem. Finally, simulations are provided to demonstrate the effectiveness of the developed method.
基金supported in part by the Australian Research Council Discovery Early Career Researcher Award(DE200101128)Australian Research Council(DP190101557)。
文摘In this paper,an adaptive dynamic programming(ADP)strategy is investigated for discrete-time nonlinear systems with unknown nonlinear dynamics subject to input saturation.To save the communication resources between the controller and the actuators,stochastic communication protocols(SCPs)are adopted to schedule the control signal,and therefore the closed-loop system is essentially a protocol-induced switching system.A neural network(NN)-based identifier with a robust term is exploited for approximating the unknown nonlinear system,and a set of switch-based updating rules with an additional tunable parameter of NN weights are developed with the help of the gradient descent.By virtue of a novel Lyapunov function,a sufficient condition is proposed to achieve the stability of both system identification errors and the update dynamics of NN weights.Then,a value iterative ADP algorithm in an offline way is proposed to solve the optimal control of protocol-induced switching systems with saturation constraints,and the convergence is profoundly discussed in light of mathematical induction.Furthermore,an actor-critic NN scheme is developed to approximate the control law and the proposed performance index function in the framework of ADP,and the stability of the closed-loop system is analyzed in view of the Lyapunov theory.Finally,the numerical simulation results are presented to demonstrate the effectiveness of the proposed control scheme.
文摘In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.
基金This work was supported in part by Beijing Natural Science Foundation(JQ19013)the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(61773373).
文摘The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.
文摘Nonlinear loads in the power distribution system cause non-sinusoidal currents and voltages with harmonic components.Shunt active filters(SAF) with current controlled voltage source inverters(CCVSI) are usually used to obtain balanced and sinusoidal source currents by injecting compensation currents.However,CCVSI with traditional controllers have a limited transient and steady state performance.In this paper,we propose an adaptive dynamic programming(ADP) controller with online learning capability to improve transient response and harmonics.The proposed controller works alongside existing proportional integral(PI) controllers to efficiently track the reference currents in the d-q domain.It can generate adaptive control actions to compensate the PI controller.The proposed system was simulated under different nonlinear(three-phase full wave rectifier) load conditions.The performance of the proposed approach was compared with the traditional approach.We have also included the simulation results without connecting the traditional PI control based power inverter for reference comparison.The online learning based ADP controller not only reduced average total harmonic distortion by 18.41%,but also outperformed traditional PI controllers during transients.
基金supported in part by the National Natural Science Foundation of China(61873056,61473068,61273148,61621004,61420106016)the Fundamental Research Funds for the Central Universities in China(N170405004,N182608004)the Research Fund of State Key Laboratory of Synthetical Automation for Process Industries in China(2013ZCX01)。
文摘This study deals with reliable control problems in data-driven cyber-physical systems(CPSs) with intermittent communication faults, where the faults may be caused by bad or broken communication devices and/or cyber attackers. To solve them, a watermark-based anomaly detector is proposed, where the faults are divided to be either detectable or undetectable.Secondly, the fault's intermittent characteristic is described by the average dwell-time(ADT)-like concept, and then the reliable control issues, under the undetectable faults to the detector, are converted into stabilization issues of switched systems. Furthermore,based on the identifier-critic-structure learning algorithm, a datadriven switched controller with a prescribed-performance-based switching law is proposed, and by the ADT approach, a tolerated fault set is given. Additionally, it is shown that the presented switching laws can improve the system performance degradation in asynchronous intervals, where the degradation is caused by the fault-maker-triggered switching rule, which is unknown for CPS operators. Finally, an illustrative example validates the proposed method.
基金the National Key Research and Development Program of China[2018YFA0702200]National transformative subject:Intelligent evolution mechanism and design of distributed information energy system,National Natural Science Foundation of China[62073065,51907098]China Postdoctoral Science Foundation[2020T130337].
文摘This paper proposes a novel virtual inertia controller for converters in power systems,whichcan solve the system’s nonlinearity for frequency support.First,the system dynamics are formulatedas a nonlinear state-space,in which the reciprocal of inertia is modeled as controlinput.Correspondingly,a cost function is defined by considering frequency deviation andrate of change of the frequency,which can preserve a tradeoff between critical frequencylimits and respective control energy.Following,the optimal frequency regulation problemis solved by using an online adaptive dynamic programming method,where the actor andcritic neural networks are constructed to approximate the optimal control input and optimalcost function,respectively.After that,the small-signal analysis is provided to identify the stabilityof the converter under the proposed controller.Finally,simulation results verify thatthe frequency response of the system is significantly improved,while retaining more DC sideenergy.
基金Project supported by the National Natural Science Foundation of China(Nos.61104006,51175319,and 11202121)the MOE Scientific Research Foundation for the Returned Overseas Chinese Scholars+1 种基金the Natural Science Foundation of Shanghai(No.11ZR1412400)the Shanghai Education Commission(Nos.12YZ010,12JC1404100,and 11CH-05),China
文摘We investigate the optimization of linear impulse systems with the reinforcement learning based adaptive dynamic programming(ADP)method.For linear impulse systems,the optimal objective function is shown to be a quadric form of the pre-impulse states.The ADP method provides solutions that iteratively converge to the optimal objective function.If an initial guess of the pre-impulse objective function is selected as a quadratic form of the pre-impulse states,the objective function iteratively converges to the optimal one through ADP.Though direct use of the quadratic objective function of the states within the ADP method is theoretically possible,the numerical singularity problem may occur due to the matrix inversion therein when the system dimensionality increases.A neural network based ADP method can circumvent this problem.A neural network with polynomial activation functions is selected to approximate the pre-impulse objective function and trained iteratively using the ADP method to achieve optimal control.After a successful training,optimal impulse control can be derived.Simulations are presented for illustrative purposes.