This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the l...This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.展开更多
The resolution of differential games often concerns the difficult problem of two points border value (TPBV), then ascribe linear quadratic differential game to Hamilton system. To Hamilton system, the algorithm of s...The resolution of differential games often concerns the difficult problem of two points border value (TPBV), then ascribe linear quadratic differential game to Hamilton system. To Hamilton system, the algorithm of symplectic geometry has the merits of being able to copy the dynamic structure of Hamilton system and keep the measure of phase plane. From the viewpoint of Hamilton system, the symplectic characters of linear quadratic differential game were probed; as a try, Symplectic-Runge-Kutta algorithm was presented for the resolution of infinite horizon linear quadratic differential game. An example of numerical calculation was given, and the result can illuminate the feasibility of this method. At the same time, it embodies the fine conservation characteristics of symplectic algorithm to system energy.展开更多
In this paper, we deal with one kind of two-player zero-sum linear quadratic stochastic differential game problem. We give the existence of an open loop saddle point if and only if the lower and upper values exist.
In this paper,a leader-follower stochastic differential game is studied for a linear stochastic differential equation with quadratic cost functionals.The coefficients in the state equation and the weighting matrices i...In this paper,a leader-follower stochastic differential game is studied for a linear stochastic differential equation with quadratic cost functionals.The coefficients in the state equation and the weighting matrices in the cost functionals are all deterministic.Closed-loop strategies are introduced,which require to be independent of initial states;and such a nature makes it very useful and convenient in applications.The follower first solves a stochastic linear quadratic optimal control problem,and his optimal closed-loop strategy is characterized by a Riccati equation,together with an adapted solution to a linear backward stochastic differential equation.Then the leader turns to solve a stochastic linear quadratic optimal control problem of a forward-backward stochastic differential equation,necessary conditions for the existence of the optimal closed-loop strategy for the leader is given by a Riccati equation.Some examples are also given.展开更多
This paper studies the proximate satellite interception guidance strategies where both the interceptor and target can perform orbital maneuvers with magnitude limited thrusts. This problem is regarded as a pursuit-eva...This paper studies the proximate satellite interception guidance strategies where both the interceptor and target can perform orbital maneuvers with magnitude limited thrusts. This problem is regarded as a pursuit-evasion game since satellites in both sides will try their best to capture or escape. In this game, the distance of these two players is small enough so that the highly nonlinear earth-centered gravitational dynamics can be reduced to the linear Clohessy-Wiltshire(CW) equations. The system is then simplified by introducing the zero effort miss variables. Saddle solution is formulated for the pursuit-evasion game and time-to-go is estimated similarly as that for the exoatmospheric interception. Then a vector guidance is derived to ensure that the interception can be achieved in the optimal time. The proposed guidance law is validated by numerical simulations.展开更多
This paper is concerned with a scenario of multiple attackers trying to intercept a target with active defense.Three types of agents are considered in the guidance:The multiple attackers,the target and the defender,wh...This paper is concerned with a scenario of multiple attackers trying to intercept a target with active defense.Three types of agents are considered in the guidance:The multiple attackers,the target and the defender,where the attackers aim to pursuit the target from different directions and evade from the defender simultaneously.The guidance engagement is formulated in the framework of a zero-sum two-person differential game between the two opposing teams,such that the measurements on the maneuver of the target or estimations on the defending strategy of the defender can be absent.Cooperation of the attackers resides in two aspects:redundant interception under the threat of the defender and the relative intercept geometry with the target.The miss distances,the relative intercept angle errors and the costs of the agents are combined into a single performance index of the game.Such formulation enables a unitary approach to the design of guidance laws for the agents.To minimize the control efforts and miss distances for the attackers,an optimization method is proposed to find the best anticipated miss distances to the defender under the constraint that the defender is endowed with a capture radius.Numerical simulations with two cases are conducted to illustrate the effectiveness of the proposed cooperative guidance law.展开更多
This paper deals with the infinite horizon linear quadratic (LQ) differential games for discrete-time stochas- tic systems with both state and control dependent noise. The Popov-Belevitch-Hautus (PBH) criteria for...This paper deals with the infinite horizon linear quadratic (LQ) differential games for discrete-time stochas- tic systems with both state and control dependent noise. The Popov-Belevitch-Hautus (PBH) criteria for exact observability and exact detectability of discrete-time stochastic systems are presented. By means of them, we give the optimal strategies (Nash equilibrium strategies) and the optimal cost values for infinite horizon stochastic differential games. It indicates that the infinite horizon LQ stochastic differential gaines are associated with four coupled matrix-valued equations. Further- more, an iterative algorithm is proposed to solve the four coupled equations. Finally, an example is given to demonstrate our results.展开更多
This paper discusses the infinite time horizon nonzero-sum linear quadratic (LQ) differential games of stochastic systems governed by Itoe's equation with state and control-dependent noise. First, the nonzero-sum L...This paper discusses the infinite time horizon nonzero-sum linear quadratic (LQ) differential games of stochastic systems governed by Itoe's equation with state and control-dependent noise. First, the nonzero-sum LQ differential games are formulated by applying the results of stochastic LQ problems. Second, under the assumption of mean-square stabilizability of stochastic systems, necessary and sufficient conditions for the existence of the Nash strategy are presented by means of four coupled stochastic algebraic Riccati equations. Moreover, in order to demonstrate the usefulness of the obtained results, the stochastic H-two/H-infinity control with state, control and external disturbance-dependent noise is discussed as an immediate application.展开更多
This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE ...This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE methods,in particular that of the notion from Peng’s stochastic backward semigroups,the authors prove a dynamic programming principle for both the upper and the lower value functions of the game.The upper and the lower value functions are then shown to be the unique viscosity solutions of the Hamilton-Jacobi-Bellman-Isaacs equations with a double-obstacle.As a consequence,the uniqueness implies that the upper and lower value functions coincide and the game admits a value.展开更多
A large class of stochastic differential games for several players is considered in this paper.The class includes Nash differential games as well as Stackelberg differential games.A mix is possible.The existence of fe...A large class of stochastic differential games for several players is considered in this paper.The class includes Nash differential games as well as Stackelberg differential games.A mix is possible.The existence of feedback strategies under general conditions is proved.The limitations concern the functionals in which the state and the controls appear separately.This is also true for the state equations.The controls appear in a quadratic form for the payoff and linearly in the state equation.The most serious restriction is the dimension of the state equation,which cannot exceed 2.The reason comes from PDE(partial differential equations) techniques used in studying the system of Bellman equations obtained by Dynamic Programming arguments.In the authors' previous work in 2002,there is not such a restriction,but there are serious restrictions on the structure of the Hamiltonians,which are violated in the applications dealt with in this article.展开更多
In this paper we first investigate zero-sum two-player stochastic differential games with reflection, with the help of theory of Reflected Backward Stochastic Differential Equations (RBSDEs). We will establish the d...In this paper we first investigate zero-sum two-player stochastic differential games with reflection, with the help of theory of Reflected Backward Stochastic Differential Equations (RBSDEs). We will establish the dynamic programming principle for the upper and the lower value functions of this kind of stochastic differential games with reflection in a straightforward way. Then the upper and the lower value functions are proved to be the unique viscosity solutions to the associated upper and the lower Hamilton-Jacobi-Bettman-Isaacs equations with obstacles, respectively. The method differs significantly from those used for control problems with reflection, with new techniques developed of interest on its own. Further, we also prove a new estimate for RBSDEs being sharper than that in the paper of E1 Karoui, Kapoudjian, Pardoux, Peng and Quenez (1997), which turns out to be very useful because it allows us to estimate the LP-distance of the solutions of two different RBSDEs by the p-th power of the distance of the initial values of the driving forward equations. We also show that the unique viscosity solution to the approximating Isaacs equation constructed by the penalization method converges to the viscosity solution of the Isaacs equation with obstacle.展开更多
This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The sy...This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.展开更多
In this paper, we use the solutions of forward-backward stochastic differential equations to get the explicit form of the optimal control for linear quadratic stochastic optimal control problem and the open-loop Nash ...In this paper, we use the solutions of forward-backward stochastic differential equations to get the explicit form of the optimal control for linear quadratic stochastic optimal control problem and the open-loop Nash equilibrium point for nonzero sum differential games problem. We also discuss the solvability of the generalized Riccati equation system and give the linear feedback regulator for the optimal control problem using the solution of this kind of Riccati equation system.展开更多
The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. ...The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.展开更多
Using the Stackelberg differential games(SDG) theory,we quantitatively study a problem of optimal intertemporal investment and tax rate design.Under some appropriate assumptions,the open-loop Stackelberg equilibrium s...Using the Stackelberg differential games(SDG) theory,we quantitatively study a problem of optimal intertemporal investment and tax rate design.Under some appropriate assumptions,the open-loop Stackelberg equilibrium solutions are obtained.Equilibrium solutions show that:1.The optimal strategies derived from differential game and unilateral optimal control approaches are different;2.It is not always the best strategy for the government to use a constant tax rate over the whole time period;3.The admissible size of tax rate adjustment may have great effect on the government's optimal strategy;4.SDG approach has no significant effect on the firm's optimal investment strategy.展开更多
A switched linear quadratic(LQ) differential game over finite-horizon is investigated in this paper. The switching signal is regarded as a non-conventional player, afterwards the definition of Pareto efficiency is e...A switched linear quadratic(LQ) differential game over finite-horizon is investigated in this paper. The switching signal is regarded as a non-conventional player, afterwards the definition of Pareto efficiency is extended to dynamics switching situations to characterize the solutions of this multi-objective problem. Furthermore, the switched differential game is equivalently transformed into a family of parameterized single-objective optimal problems by introducing preference information and auxiliary variables. This transformation reduces the computing complexity such that the Pareto frontier of the switched LQ differential game can be constructed by dynamic programming. Finally, a numerical example is provided to illustrate the effectiveness.展开更多
This paper discusses the capturability with fixed time of a pseudo-linear differential game of pursuit. The pursuit set in which the pursuit will end once the initial state lies in this set is given by the method of i...This paper discusses the capturability with fixed time of a pseudo-linear differential game of pursuit. The pursuit set in which the pursuit will end once the initial state lies in this set is given by the method of integration of multi-valued function. The results obtained here solve an open problem of Pontrjagin’s on the linear differential game of pursuit. Meanwhile, the requirement of the convexity of the control set and other related Pontrjagin’s conditions are removed.展开更多
We study a kind of partial information non-zero sum differential games of mean-field backward doubly stochastic differential equations,in which the coefficient contains not only the state process but also its marginal...We study a kind of partial information non-zero sum differential games of mean-field backward doubly stochastic differential equations,in which the coefficient contains not only the state process but also its marginal distribution,and the cost functional is also of mean-field type.It is required that the control is adapted to a sub-filtration of the filtration generated by the underlying Brownian motions.We establish a necessary condition in the form of maximum principle and a verification theorem,which is a sufficient condition for Nash equilibrium point.We use the theoretical results to deal with a partial information linear-quadratic(LQ)game,and obtain the unique Nash equilibrium point for our LQ game problem by virtue of the unique solvability of mean-field forward-backward doubly stochastic differential equation.展开更多
Based on differential game theory,the decision-making problem of two homogeneous countries facing transboundary marine litter governance is studied.On the basis of assuming that the input of marine litter is an exogen...Based on differential game theory,the decision-making problem of two homogeneous countries facing transboundary marine litter governance is studied.On the basis of assuming that the input of marine litter is an exogenous variable,the focus is on reducing the accumulation of marine litter through cleanup and transfer processing by both parties.Considering the constant and increasing input of marine litter,in the framework of international agreement constraints,the analysis of the game behavior of the players in the marine litter governance under the open-loop strategy(in the case of agreement constraints)and the Markov strategy(in the case of no agreement constraints)was compared and analyzed.The research results show that when the direct pollution cost of marine litter is high enough,both sides of the game adopt an open-loop strategy that complies with the constraints of the agreement,which can reduce the accumulation of marine litter and improve the environmental quality.However,when there is a high initial accumulation of marine litter,the Markov strategy without protocol constraints will be better than the open-loop strategy.In the case that marine litter does not need to be transferred,there will be no difference between the two sides of the game adopting the Markov strategy and adopting the open-loop strategy on the equilibrium growth path.展开更多
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f...Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.展开更多
基金supported by the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation (USCAST2022-11)Aeronautical Science Foundation of China (20220001057001)。
文摘This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.
基金Project supported by the National Aeronautics Base Science Foundation of China (No.2000CB080601)the National Defence Key Pre-research Program of China during the 10th Five-Year Plan Period (No.2002BK080602)
文摘The resolution of differential games often concerns the difficult problem of two points border value (TPBV), then ascribe linear quadratic differential game to Hamilton system. To Hamilton system, the algorithm of symplectic geometry has the merits of being able to copy the dynamic structure of Hamilton system and keep the measure of phase plane. From the viewpoint of Hamilton system, the symplectic characters of linear quadratic differential game were probed; as a try, Symplectic-Runge-Kutta algorithm was presented for the resolution of infinite horizon linear quadratic differential game. An example of numerical calculation was given, and the result can illuminate the feasibility of this method. At the same time, it embodies the fine conservation characteristics of symplectic algorithm to system energy.
基金The Young Research Foundation(201201130) of Jilin Provincial Science&Technology DepartmentResearch Foundation(2011LG17) of Changchun University of Technology
文摘In this paper, we deal with one kind of two-player zero-sum linear quadratic stochastic differential game problem. We give the existence of an open loop saddle point if and only if the lower and upper values exist.
基金This work was supported by National Key Research&Development Program of China under Grant No.2022YFA1006104National Natural Science Foundations of China under Grant Nos.11971266,11831010Shandong Provincial Natural Science Foundations under Grant Nos.ZR2022JQ01,ZR2020ZD24,ZR2019ZD42.
文摘In this paper,a leader-follower stochastic differential game is studied for a linear stochastic differential equation with quadratic cost functionals.The coefficients in the state equation and the weighting matrices in the cost functionals are all deterministic.Closed-loop strategies are introduced,which require to be independent of initial states;and such a nature makes it very useful and convenient in applications.The follower first solves a stochastic linear quadratic optimal control problem,and his optimal closed-loop strategy is characterized by a Riccati equation,together with an adapted solution to a linear backward stochastic differential equation.Then the leader turns to solve a stochastic linear quadratic optimal control problem of a forward-backward stochastic differential equation,necessary conditions for the existence of the optimal closed-loop strategy for the leader is given by a Riccati equation.Some examples are also given.
基金co-supported by the National Natural Science Foundation of China (Nos.61603115,91438202 and91638301)China Postdoctoral Science Foundation (No.2015M81455)+1 种基金the Open Fund of National Defense Key Discipline Laboratory of Micro-Spacecraft Technology of China (No.HIT.KLOF.MST.201601)the Heilongjiang Postdoctoral Fund of China (No.LBH-Z15085)
文摘This paper studies the proximate satellite interception guidance strategies where both the interceptor and target can perform orbital maneuvers with magnitude limited thrusts. This problem is regarded as a pursuit-evasion game since satellites in both sides will try their best to capture or escape. In this game, the distance of these two players is small enough so that the highly nonlinear earth-centered gravitational dynamics can be reduced to the linear Clohessy-Wiltshire(CW) equations. The system is then simplified by introducing the zero effort miss variables. Saddle solution is formulated for the pursuit-evasion game and time-to-go is estimated similarly as that for the exoatmospheric interception. Then a vector guidance is derived to ensure that the interception can be achieved in the optimal time. The proposed guidance law is validated by numerical simulations.
基金supported by the Science and Technology Innovation 2030-Key Project of “New Generation Artificial Intelligence”,China(No.2020AAA0108200)the National Natural Science Foundation of China(Nos.61873011,61922008,61973013 and 61803014)+3 种基金the Defense Industrial Technology Development Program,China(No.JCKY2019601C106)the Innovation Zone Project,China(No.18-163-00-TS-001-00134)the Foundation Strengthening Program Technology Field Fund,China(No.2019-JCJQ-JJ-243)the Fund from Key Laboratory of Dependable Service Computing in Cyber Physical Society,China(No.CPSDSC202001)。
文摘This paper is concerned with a scenario of multiple attackers trying to intercept a target with active defense.Three types of agents are considered in the guidance:The multiple attackers,the target and the defender,where the attackers aim to pursuit the target from different directions and evade from the defender simultaneously.The guidance engagement is formulated in the framework of a zero-sum two-person differential game between the two opposing teams,such that the measurements on the maneuver of the target or estimations on the defending strategy of the defender can be absent.Cooperation of the attackers resides in two aspects:redundant interception under the threat of the defender and the relative intercept geometry with the target.The miss distances,the relative intercept angle errors and the costs of the agents are combined into a single performance index of the game.Such formulation enables a unitary approach to the design of guidance laws for the agents.To minimize the control efforts and miss distances for the attackers,an optimization method is proposed to find the best anticipated miss distances to the defender under the constraint that the defender is endowed with a capture radius.Numerical simulations with two cases are conducted to illustrate the effectiveness of the proposed cooperative guidance law.
基金supported by the National Natural Science Foundation of China(No.61174078)the Specialized Research Fund for the Doctoral Program of Higher Education of China(No.20103718110006)A Project of Shandong Province Higher Educational Science and Technology Program(No.J12LN14)
文摘This paper deals with the infinite horizon linear quadratic (LQ) differential games for discrete-time stochas- tic systems with both state and control dependent noise. The Popov-Belevitch-Hautus (PBH) criteria for exact observability and exact detectability of discrete-time stochastic systems are presented. By means of them, we give the optimal strategies (Nash equilibrium strategies) and the optimal cost values for infinite horizon stochastic differential games. It indicates that the infinite horizon LQ stochastic differential gaines are associated with four coupled matrix-valued equations. Further- more, an iterative algorithm is proposed to solve the four coupled equations. Finally, an example is given to demonstrate our results.
基金supported by the National Natural Science Foundation of China(No.71171061)the Natural Science Foundation of Guangdong Province(No.S2011010004970)
文摘This paper discusses the infinite time horizon nonzero-sum linear quadratic (LQ) differential games of stochastic systems governed by Itoe's equation with state and control-dependent noise. First, the nonzero-sum LQ differential games are formulated by applying the results of stochastic LQ problems. Second, under the assumption of mean-square stabilizability of stochastic systems, necessary and sufficient conditions for the existence of the Nash strategy are presented by means of four coupled stochastic algebraic Riccati equations. Moreover, in order to demonstrate the usefulness of the obtained results, the stochastic H-two/H-infinity control with state, control and external disturbance-dependent noise is discussed as an immediate application.
基金supported by the National Nature Science Foundation of China under Grant Nos.11701040,11871010,61871058the Fundamental Research Funds for the Central Universities under Grant No.2019XDA11。
文摘This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE methods,in particular that of the notion from Peng’s stochastic backward semigroups,the authors prove a dynamic programming principle for both the upper and the lower value functions of the game.The upper and the lower value functions are then shown to be the unique viscosity solutions of the Hamilton-Jacobi-Bellman-Isaacs equations with a double-obstacle.As a consequence,the uniqueness implies that the upper and lower value functions coincide and the game admits a value.
基金supported by DAAD-PPP Hong Kong/Germany (No. G. HK 036/09)
文摘A large class of stochastic differential games for several players is considered in this paper.The class includes Nash differential games as well as Stackelberg differential games.A mix is possible.The existence of feedback strategies under general conditions is proved.The limitations concern the functionals in which the state and the controls appear separately.This is also true for the state equations.The controls appear in a quadratic form for the payoff and linearly in the state equation.The most serious restriction is the dimension of the state equation,which cannot exceed 2.The reason comes from PDE(partial differential equations) techniques used in studying the system of Bellman equations obtained by Dynamic Programming arguments.In the authors' previous work in 2002,there is not such a restriction,but there are serious restrictions on the structure of the Hamiltonians,which are violated in the applications dealt with in this article.
基金supported by the Agence Nationale de la Recherche (France), reference ANR-10-BLAN 0112the Marie Curie ITN "Controlled Systems", call: FP7-PEOPLE-2007-1-1-ITN, no. 213841-2+3 种基金supported by the National Natural Science Foundation of China (No. 10701050, 11071144)National Basic Research Program of China (973 Program) (No. 2007CB814904)Shandong Province (No. Q2007A04),Independent Innovation Foundation of Shandong Universitythe Project-sponsored by SRF for ROCS, SEM
文摘In this paper we first investigate zero-sum two-player stochastic differential games with reflection, with the help of theory of Reflected Backward Stochastic Differential Equations (RBSDEs). We will establish the dynamic programming principle for the upper and the lower value functions of this kind of stochastic differential games with reflection in a straightforward way. Then the upper and the lower value functions are proved to be the unique viscosity solutions to the associated upper and the lower Hamilton-Jacobi-Bettman-Isaacs equations with obstacles, respectively. The method differs significantly from those used for control problems with reflection, with new techniques developed of interest on its own. Further, we also prove a new estimate for RBSDEs being sharper than that in the paper of E1 Karoui, Kapoudjian, Pardoux, Peng and Quenez (1997), which turns out to be very useful because it allows us to estimate the LP-distance of the solutions of two different RBSDEs by the p-th power of the distance of the initial values of the driving forward equations. We also show that the unique viscosity solution to the approximating Isaacs equation constructed by the penalization method converges to the viscosity solution of the Isaacs equation with obstacle.
文摘This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.
基金This work is supported by the National Natural Science Foundation (Grant No.10371067)the Youth Teacher Foundation of Fok Ying Tung Education Foundation, the Excellent Young Teachers Program and the Doctoral Program Foundation of MOE and Shandong Province, China.
文摘In this paper, we use the solutions of forward-backward stochastic differential equations to get the explicit form of the optimal control for linear quadratic stochastic optimal control problem and the open-loop Nash equilibrium point for nonzero sum differential games problem. We also discuss the solvability of the generalized Riccati equation system and give the linear feedback regulator for the optimal control problem using the solution of this kind of Riccati equation system.
基金国家自然科学基金,Outstanding Young Teachers of Ministry of Education of China,Special Fund for Ph.D.Program of Ministry of Education of China,Fok Ying Tung Education Foundation
文摘The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.
基金This research is supported by the National Natural Science Fund of China(70371030 70372041+1 种基金 79970073) the Postdoctoral Science Fund of China, and the Key Teacher Fund of Chongqing University.
文摘Using the Stackelberg differential games(SDG) theory,we quantitatively study a problem of optimal intertemporal investment and tax rate design.Under some appropriate assumptions,the open-loop Stackelberg equilibrium solutions are obtained.Equilibrium solutions show that:1.The optimal strategies derived from differential game and unilateral optimal control approaches are different;2.It is not always the best strategy for the government to use a constant tax rate over the whole time period;3.The admissible size of tax rate adjustment may have great effect on the government's optimal strategy;4.SDG approach has no significant effect on the firm's optimal investment strategy.
基金supported by the National Natural Science Foundation of China under Grant No.61773098the 111 Project under Grant No.B16009
文摘A switched linear quadratic(LQ) differential game over finite-horizon is investigated in this paper. The switching signal is regarded as a non-conventional player, afterwards the definition of Pareto efficiency is extended to dynamics switching situations to characterize the solutions of this multi-objective problem. Furthermore, the switched differential game is equivalently transformed into a family of parameterized single-objective optimal problems by introducing preference information and auxiliary variables. This transformation reduces the computing complexity such that the Pareto frontier of the switched LQ differential game can be constructed by dynamic programming. Finally, a numerical example is provided to illustrate the effectiveness.
文摘This paper discusses the capturability with fixed time of a pseudo-linear differential game of pursuit. The pursuit set in which the pursuit will end once the initial state lies in this set is given by the method of integration of multi-valued function. The results obtained here solve an open problem of Pontrjagin’s on the linear differential game of pursuit. Meanwhile, the requirement of the convexity of the control set and other related Pontrjagin’s conditions are removed.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.11871309,11671229,71871129,11371226,11301298)the National Key R&D Program of China(Grant No.2018 YFA0703900)+2 种基金the Natural Science Foundation of Shandong Province(No.ZR2019MA013)the Special Funds of Taishan Scholar Project(No.tsqn20161041)the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions.
文摘We study a kind of partial information non-zero sum differential games of mean-field backward doubly stochastic differential equations,in which the coefficient contains not only the state process but also its marginal distribution,and the cost functional is also of mean-field type.It is required that the control is adapted to a sub-filtration of the filtration generated by the underlying Brownian motions.We establish a necessary condition in the form of maximum principle and a verification theorem,which is a sufficient condition for Nash equilibrium point.We use the theoretical results to deal with a partial information linear-quadratic(LQ)game,and obtain the unique Nash equilibrium point for our LQ game problem by virtue of the unique solvability of mean-field forward-backward doubly stochastic differential equation.
基金supported by the Qihang Project of Zhejiang University(Grant No.202016)。
文摘Based on differential game theory,the decision-making problem of two homogeneous countries facing transboundary marine litter governance is studied.On the basis of assuming that the input of marine litter is an exogenous variable,the focus is on reducing the accumulation of marine litter through cleanup and transfer processing by both parties.Considering the constant and increasing input of marine litter,in the framework of international agreement constraints,the analysis of the game behavior of the players in the marine litter governance under the open-loop strategy(in the case of agreement constraints)and the Markov strategy(in the case of no agreement constraints)was compared and analyzed.The research results show that when the direct pollution cost of marine litter is high enough,both sides of the game adopt an open-loop strategy that complies with the constraints of the agreement,which can reduce the accumulation of marine litter and improve the environmental quality.However,when there is a high initial accumulation of marine litter,the Markov strategy without protocol constraints will be better than the open-loop strategy.In the case that marine litter does not need to be transferred,there will be no difference between the two sides of the game adopting the Markov strategy and adopting the open-loop strategy on the equilibrium growth path.
文摘Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.