期刊文献+
共找到25篇文章
< 1 2 >
每页显示 20 50 100
Using approximate dynamic programming for multi-ESM scheduling to track ground moving targets 被引量:5
1
作者 WAN Kaifang GAO Xiaoguang +1 位作者 LI Bo LI Fei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2018年第1期74-85,共12页
This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain e... This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain environment. For adaptive selection of appropriate ESMs, we generalize an approximate dynamic programming(ADP) framework to the dynamic case. We define the environment model and agent model, respectively. To handle the partially observable challenge, we apply the unsented Kalman filter(UKF) algorithm for belief state estimation. To reduce the computational burden, a simulation-based approach rollout with a redesigned base policy is proposed to approximate the long-term cumulative reward. Meanwhile, Monte Carlo sampling is combined into the rollout to estimate the expectation of the rewards. The experiments indicate that our method outperforms other strategies due to its better performance in larger-scale problems. 展开更多
关键词 sensor scheduling target tracking approximate dynamic programming non-myopic rollout belief state
下载PDF
Approximate Dynamic Programming for Stochastic Resource Allocation Problems 被引量:4
2
作者 Ali Forootani Raffaele Iervolino +1 位作者 Massimo Tipaldi Joshua Neilson 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第4期975-990,共16页
A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource... A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach. 展开更多
关键词 approximate dynamic programming(ADP) dynamic programming(DP) Markov decision processes(MDPs) resource allocation problem
下载PDF
Two-stage robust optimization of power cost minimization problem in gunbarrel natural gas networks by approximate dynamic programming 被引量:1
3
作者 Yi-Ze Meng Ruo-Ran Chen Tian-Hu Deng 《Petroleum Science》 SCIE CAS CSCD 2022年第5期2497-2517,共21页
In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas ... In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties. 展开更多
关键词 Natural gas Gunbarrel gas pipeline networks Robust optimization approximate dynamic programming
下载PDF
Call for papers Journal of Control Theory and Applications Special issue on Approximate dynamic programming and reinforcement learning
4
《控制理论与应用(英文版)》 EI 2010年第2期257-257,共1页
Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
关键词 Call for papers Journal of Control Theory and Applications Special issue on approximate dynamic programming and reinforcement learning
下载PDF
Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems 被引量:2
5
作者 S.N.BALAKRISHNAN 《控制理论与应用(英文版)》 EI 2011年第3期370-380,共11页
Approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) equations.As... Approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) equations.As interest in ADP and the AC solutions are escalating with time,there is a dire need to consider possible enabling factors for their implementations.A typical AC structure consists of two interacting NNs,which is computationally expensive.In this paper,a new architecture,called the ’cost-function-based single network adaptive critic(J-SNAC)’ is presented,which eliminates one of the networks in a typical AC structure.This approach is applicable to a wide class of nonlinear systems in engineering.In order to demonstrate the benefits and the control synthesis with the J-SNAC,two problems have been solved with the AC and the J-SNAC approaches.Results are presented,which show savings of about 50% of the computational costs by J-SNAC while having the same accuracy levels of the dual network structure in solving for optimal control.Furthermore,convergence of the J-SNAC iterations,which reduces to a least-squares problem,is discussed;for linear systems,the iterative process is shown to reduce to solving the familiar algebraic Ricatti equation. 展开更多
关键词 approximate dynamic programming Optimal control Nonlinear control Adaptive critic Cost-functionbased single network adaptive critic J-SNAC architecture
下载PDF
Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach 被引量:1
6
作者 魏庆来 刘德荣 徐延才 《Chinese Physics B》 SCIE EI CAS CSCD 2015年第3期87-94,共8页
A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking prob... A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation. 展开更多
关键词 adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming
原文传递
Transfer-based Approximate Dynamic Programmingfor Rolling Security-constrained Unit Commitment with Uncertainties
7
作者 Jianquan Zhu Kai Zeng +3 位作者 Jiajun Chen Wenmeng Zhao Wenhao Liu Wenkai Zhu 《Protection and Control of Modern Power Systems》 SCIE EI 2024年第5期42-53,共12页
This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a tran... This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a transfer-based approximate dynamic programming(TADP)algorithm proposed in this paper.Different from traditional approximate dynamic programming(ADP)algorithms,TADP can obtain the commitment states of most units in advance through a decision transfer technique,thus reducing the action space of TADP significantly.Moreover,compared with traditional ADP algorithms,which require to determine the commitment state of each unit,TADP only needs determine the unit with the smallest on-state probability among all on-state units,thus further reducing the action space.The proposed algorithm can also prevent the iter-ative update of value functions and the reliance on rolling forecast information,which makes more sense in the rolling decision-making process of RSCUC.Finally,nu-merical simulations are carried out on a modified IEEE 39-bus system and a real 2778-bus system to demonstrate the effectiveness of the proposed algorithm. 展开更多
关键词 Rolling security-constrained unit com-mitment approximate dynamic programming decision transfer probability-based decision priority criterion uncertainty
下载PDF
Real-time Risk-averse Dispatch of an Integrated Electricity and Natural Gas System via Condi-tional Value-at-risk-based Lookup-table Ap-proximate Dynamic Programming
8
作者 Jianquan Zhu Guanhai Li +4 位作者 Ye Guo Jiajun Chen Haixin Liu Yuhao Luo Wenhao Liu 《Protection and Control of Modern Power Systems》 SCIE EI 2024年第2期47-60,共14页
The real-time risk-averse dispatch problem of an integrated electricity and natural gas system(IEGS)is studied in this paper.It is formulated as a real-time conditional value-at-risk(CVaR)-based risk-averse dis-patch ... The real-time risk-averse dispatch problem of an integrated electricity and natural gas system(IEGS)is studied in this paper.It is formulated as a real-time conditional value-at-risk(CVaR)-based risk-averse dis-patch model in the Markov decision process framework.Because of its stochasticity,nonconvexity and nonlinearity,the model is difficult to analyze by traditional algorithms in an acceptable time.To address this non-deterministic polynomial-hard problem,a CVaR-based lookup-table approximate dynamic programming(CVaR-ADP)algo-rithm is proposed,and the risk-averse dispatch problem is decoupled into a series of tractable subproblems.The line pack is used as the state variable to describe the impact of one period’s decision on the future.This facilitates the reduction of load shedding and wind power curtailment.Through the proposed method,real-time decisions can be made according to the current information,while the value functions can be used to overview the whole opti-mization horizon to balance the current cost and future risk loss.Numerical simulations indicate that the pro-posed method can effectively measure and control the risk costs in extreme scenarios.Moreover,the decisions can be made within 10 s,which meets the requirement of the real-time dispatch of an IEGS.Index Terms—Integrated electricity and natural gas system,approximate dynamic programming,real-time dispatch,risk-averse,conditional value-at-risk. 展开更多
关键词 Integrated electricity and natural gas system approximate dynamic programming real-time dispatch RISK-AVERSE conditional value-at-risk
下载PDF
Chaotic system optimal tracking using data-based synchronous method with unknown dynamics and disturbances
9
作者 宋睿卓 魏庆来 《Chinese Physics B》 SCIE EI CAS CSCD 2017年第3期268-275,共8页
We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. Acco... We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem. 展开更多
关键词 adaptive dynamic programming approximate dynamic programming chaotic system ZERO-SUM
原文传递
Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems 被引量:1
10
作者 Guangyu Zhu Xiaolu Li +2 位作者 Ranran Sun Yiyuan Yang Peng Zhang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第3期781-791,共11页
Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iterati... Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iteration(DTTV)algorithm,is developed.The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance.The admissibility of the iterative control law is analyzed.The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution.To implement the algorithm,neural networks are employed and a new implementation structure is established,which avoids solving the generalized Bellman equation in each iteration.Finally,the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm,where the mass and pendulum bar length are permitted to be time-varying parameters.The effectiveness of the developed method is illustrated by numerical results and comparisons. 展开更多
关键词 Adaptive critic designs adaptive dynamic programming approximate dynamic programming optimal control policy iteration TIME-VARYING
下载PDF
Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control 被引量:9
11
作者 Mingming Ha Ding Wang Derong Liu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第7期1262-1272,共11页
The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of t... The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches. 展开更多
关键词 Adaptive critic design adaptive dynamic programming(ADP) approximate dynamic programming discrete-time nonlinear systems reinforcement learning stability analysis tracking control value iteration(VI)
下载PDF
A Novel Distributed Optimal Adaptive Control Algorithm for Nonlinear Multi-Agent Differential Graphical Games 被引量:5
12
作者 Majid Mazouchi Mohammad Bagher Naghibi-Sistani Seyed Kamal Hosseini Sani 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2018年第1期331-341,共11页
In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control p... In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm. 展开更多
关键词 approximate dynamic programming(ADP) distributed control neural networks(NNs) nonlinear differentia graphical games optimal control
下载PDF
A Novel Face Recognition Algorithm for Distinguishing Faces with Various Angles 被引量:3
13
作者 Yong-Zhong Lu 《International Journal of Automation and computing》 EI 2008年第2期193-197,共5页
In order to distinguish faces of various angles during face recognition, an algorithm of the combination of approximate dynamic programming (ADP) called action dependent heuristic dynamic programming (ADHDP) and p... In order to distinguish faces of various angles during face recognition, an algorithm of the combination of approximate dynamic programming (ADP) called action dependent heuristic dynamic programming (ADHDP) and particle swarm optimization (PSO) is presented. ADP is used for dynamically changing the values of the PSO parameters. During the process of face recognition, the discrete cosine transformation (DCT) is first introduced to reduce negative effects. Then, Karhunen-Loeve (K-L) transformation can be used to compress images and decrease data dimensions. According to principal component analysis (PCA), the main parts of vectors are extracted for data representation. Finally, radial basis function (RBF) neural network is trained to recognize various faces. The training of RBF neural network is exploited by ADP-PSO. In terms of ORL Face Database, the experimental result gives a clear view of its accurate efficiency. 展开更多
关键词 Face recognition approximate dynamic programming (ADP) particle swarm optimization (PSO)
下载PDF
Direct heuristic dynamic programming based on an improved PID neural network 被引量:2
14
作者 Jian SUN Feng LIU +1 位作者 Jennie SI Shengwei MEI 《控制理论与应用(英文版)》 EI 2012年第4期497-503,共7页
In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of app... In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of approximate dynamic programming (ADP), DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm. 展开更多
关键词 approximate dynamic programming (ADP) Direct heuristic dynamic programming (DHDP) ImprovedPID neural network (IPIDNN)
下载PDF
Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
15
作者 魏庆来 宋睿卓 +1 位作者 孙秋野 肖文栋 《Chinese Physics B》 SCIE EI CAS CSCD 2015年第9期147-152,共6页
This paper estimates an off-policy integral reinforcement learning(IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the... This paper estimates an off-policy integral reinforcement learning(IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman(HJB) equation, an off-policy IRL algorithm is proposed.It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. 展开更多
关键词 adaptive dynamic programming approximate dynamic programming chaotic system optimal tracking control
原文传递
State of the Art of Adaptive Dynamic Programming and Reinforcement Learning
16
作者 Derong Liu Mingming Ha Shan Xue 《CAAI Artificial Intelligence Research》 2022年第2期93-110,共18页
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra... This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms. 展开更多
关键词 adaptive dynamic programming approximate dynamic programming adaptive critic designs neuro-dynamic programming neural dynamic programming reinforcement learning intelligent control learning control optimal control
原文传递
Adaptive dynamic programming for online solution of a zero-sum differential game 被引量:10
17
作者 Draguna VRABIE Frank LEWIS 《控制理论与应用(英文版)》 EI 2011年第3期353-360,共8页
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosu... This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance. 展开更多
关键词 approximate/Adaptive dynamic programming Game algebraic Riccati equation Zero-sum differential game Nash equilibrium
下载PDF
Distributed and Risk-averse ADP Algorithm for Stochastic Economic Dispatch of Power System with Multiple Offshore Wind Farms
18
作者 Xiangyong Feng Shunjiang Lin +2 位作者 Yutao Liang Guansheng Fan Mingbo Liu 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2024年第5期1977-1993,共17页
With more and more offshore wind power being increasingly connected to power grids,fluctuations in offshore wind speeds result in risks of high operation costs.To mitigate this problem,a risk-averse stochastic economi... With more and more offshore wind power being increasingly connected to power grids,fluctuations in offshore wind speeds result in risks of high operation costs.To mitigate this problem,a risk-averse stochastic economic dispatch(ED)model of power system with multiple offshore wind farms(OWFs)is proposed in this paper.In this model,a novel GlueVaR method is used to measure the tail risk of the probability distribution of operation cost.The weighted sum of the expected operation cost and the GlueVaR is used to reflect the risk of operation cost,which can consider different risk requirements including risk aversion and risk neutrality flexibly by adjusting parameters.Then,a risk-averse approximate dynamic programming(ADP)algorithm is designed for solving the proposed model,in which multi-period ED problem is decoupled into a series of single-period ED problems.Besides,GlueVaR is introduced into the approximate value function training process for risk aversion.Finally,a distributed and risk-averse ADP algorithm is constructed based on the alternating direction method of multipliers,which can further decouple single-period ED between transmission system and multiple OWFs for ensuring information privacy.Case studies on the modified IEEE 39-bus system with an OWF and an actual provincial power system with four OWFs demonstrate correctness and efficiency of the proposed model and algorithm. 展开更多
关键词 approximate dynamic programming(ADP) alternating direction method of multipliers GlueVaR offshore wind farm risk-averse stochastic optimization
原文传递
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications 被引量:2
19
作者 Warren B.POWELL 《控制理论与应用(英文版)》 EI 2011年第3期336-352,共17页
We review the literature on approximate dynamic programming,with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and action... We review the literature on approximate dynamic programming,with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and actions and complex information processes.We build on the literature that has addressed the well-known problem of multidimensional(and possibly continuous) states,and the extensive literature on model-free dynamic programming,which also assumes that the expectation in Bellman’s equation cannot be computed.However,we point out complications that arise when the actions/controls are vector-valued and possibly continuous.We then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees(with technical assumptions) for both parametric and nonparametric architectures for the value function. 展开更多
关键词 approximate dynamic programming Reinforcement learning Optimal control Approximation algorithms
下载PDF
A model-based approximate λ-policy iteration approach to online evasive path planning and the video game Ms.Pac-Man
20
作者 Greg FODERARO Vikram RAJU Silvia FERRARI 《控制理论与应用(英文版)》 EI 2011年第3期391-399,共9页
This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,where an agent must visit several target positions within... This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively pursuing adversaries.This method is relevant to applications,such as robotic path planning,mobile-sensor applications,and path exposure.The methodology described utilizes cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online learning with prior knowledge through modeling to achieve the objectives of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations.Online learning and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments.The approach is illustrated through a modified version of the video game Ms.Pac-Man,which is shown to be a benchmark example of the pursuit-evasion problem.The results show that the approach presented in this paper outperforms several other methods as well as most human players. 展开更多
关键词 approximate dynamic programming Reinforcement learning Path planning Pursuit evasion games
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部